Frequently Asked Questions (FAQ)
Infrastructure as Code (IaC) can be really confusing! We’ve assembled this list of frequently asked questions (FAQ) to help guide you through the maze of topics and challenges that frustrate even the most seasoned DevOps practitioners.
These questions are based on a six part discussion about lessons RackN learned scaling IaC.
In the series, Rob and Swapnil dive deeply into disciplines that build and scale Infrastructure as Code practice.
What is RackN’s approach to building a system, as a developer experience?
For RackN, IaC signifies a developer process in which automation can be achieved with reusable blocks. It has modular pieces and built-in-tests in order to check repeatability. This means checking automation repeatability as a part of the process, not just assuming that it will work if it worked once. In the same way that you test code, you should also test automation for every single case. “As code” means testing repeatability.
What is “distributed”?
Distributed means having infrastructure in many places or teams in many places. Infrastructure always ends up spanning over multiple locations or sites, so intentional distributed infrastructure can be built with resiliency in mind.
What are “modules”?
Modules are components that are generic enough to be reused in multiple scenarios. This creates portability, which is the ability to reuse components. Modularity and “modules” can refer to various components like automation blocks/mechanisms or any modular code. Without modularity, people have to reinvent the wheel – make the same thing for themselves and then tweak it.
How do modularity and IaC allow for collaboration?
One benefit of modularity is that automation blocks can be reused to achieve consistent experiences. The entire point of modularity is to allow collaboration. IaC libraries provide modules that multiple different customers can use, reuse, customize, etc.
What is version control, and how does it relate to automation?
Version control is the documentation of the unique information of an automation mechanism. This allows the automation mechanism to become a reusable module.
What is immutability?
Immutability is a method of building a system with defined and reusable artifacts, rather than making one-off changes as you go based on individual instances like patchwork. The artifacts are defined when they are created, and therefore called immutable artifacts.
Why is immutability important in IaC?
Unlike traditional Infrastructure Automation scripts that are unique (or modified) per environment, immutable artifacts allow operators to exactly copy automation between sites. By ensuring that tests, validation and performance from one site can be replicated to other sites, operators are able to enjoy a high degree of repeatability when they build, test and distribute automation.
What is repeatability?
Repeatability is when an expected result occurs multiple times. Infrastructure repeatability is focused on a consistent result using automation, despite different underlying components system to system. Various reusable modules make the process of building these systems easier.
How do you achieve consistency?
Consistency is achieved better by focusing on the same end result rather than forcing the same process every time. This normally means that you can’t stick to only one vendor. We acknowledge that paths are going to be different, and we make it possible to change paths to get the same results. Consistency occurs at the end, not at the beginning or within the process. Different reusable modules can be used to reach a consistent result.
What factors make it difficult for automation to be repeatable?
Small mistakes and changes accumulate, especially with bare metal. Some factors and moving pieces in infrastructure include new firmware, new devices, enumeration changing, different API for a cloud, tool behavior version update or components that are no longer available. These factors are often out of your control, and you have to accommodate for these changes on an ongoing basis. Something that is functional one day could stop working the next day because of one of these variables. Once something changes, the previous automation is no longer functional, let alone repeatable.
Why does automation commonly have high failure rates?
Mistakes and problems can be hidden by retries – it’s possible to try until you get what you want and ignore past failures and hide problems instead of solving them. The RackN approach is that instead of retrying when something fails, look for errors and the root cause as well as put additional checks in. Not using retries is the best way to lower the failure rate of automation.
How is IaC (hybrid and multi-cloud uses) helpful to a company?
Hybrid and multi-cloud uses of IaC are distributed operations. IaC uses immutable artifacts and various reusable modules, making distributed operations easier. This creates reliability and repeatability that allow collaboration across various teams and systems in a company.
How do Day 2 operations work in practice?
The Day 2 approach is repeatable and reliable automation while also expecting changes over time. This means orchestration of maintaining in-sync systems given rolling upgrades and individual user patches.
What impact does immutability have on Day 2 Operations?
An immutability approach influences the way that systems are managed. It requires rebuilding in order to facilitate that individual tools/programs run smoothly together to function as a system and not as individual parts.
What impact does collaboration have on Day 2 Operations?
Day2 mentality is future proofing – building systems that can be handed off to another person and be used and maintained. Small changes add up over time, so it’s really important to make systems readable, reusable, understandable, and maintainable for future teams or even yourself.
What impact does compliance have on Day 2 Operations?
Observability is important to Day 2 because Day 2 operations are quick. Day 2 operations are reactions to changes in order to maintain a system. These changes in Day2 components can be accommodated because of observability. The more observability then means the better maintained the systems are. Through this, observability improves long term resistance and ability to fix things later down the line.
How can we foster a shared development for Infrastructure as Code?
Customers have built developer test production systems, and it’s important that you are able to put automation through a Git controlled gate and validate it. Meaning that you can test it in another developer’s test and work. Testing it in multiple gates allows collaboration because it can be used in any customized automation – it’s not hard to integrate and use.
How do companies keep sites in sync?
The key to keeping sites in sync is the location of the control plane. You need to keep multiple sites synced, but you don’t want individual sites to go down if a different piece is in trouble. To solve this, a centralized system can push out automation on a planned schedule while also receiving updates about local control planes.
What is common usage of IaC and DevOps practices?
IaC introduces a way to integrate and combine already commonly used tools and practices.
What are the challenges of scaling IaC?
Challenges include handling unpredictable issues as you build as well as setting standards in a way that facilitates future collaboration.
How does GitOps scale IaC?
GitOps scales IaC through the use of immutable artifacts that are well defined and version control with known configurations.
How can RackN pipelines fit different use cases?
A basic pipeline from RackN has the option to be extended or altered. In the past, when the baseline modules/pipelines were updated, prior customization would be stripped away. RackN looks to give customers a way to customize and still receive baseline updates. This is done through a pipeline concept with known injection points where customers can select various provided customization options.
How is customization handled with automation?
A platform model with modules that have the ability to move state (they can read state conditions and report back that information) can therefore work in multiple states or environments. There are complexity differences between systems, and the modules have to have built-in-tests and customization options so they can easily be fit to a different environment. This is so that people don’t have to rebuild the same piece over and over for their one specific use case scenario.
How do distributed operations benefit companies?
Different sites have varying configurations and may need additional individualized action in order to make changes effective. Distributed operations takes this into account and helps coordinate actions accordingly.
What is observability and why is it important?
Higher level operations rely on smaller individual parts functioning – observability means you can see every single little part underneath as it’s running. It’s difficult to manage automation without observability because things can run and then the container it’s in just goes away. It’s important to understand the process rather than just the result in order to manage a system.
How does observability help operators?
When looking at an already built system, you can still see how things were built down to the individual parts, and therefore troubleshoot in real time.
What is the difference between observability, monitoring and logging?
Observability doesn’t replace monitoring and logging. Monitoring is collecting time series, performance data and health checks. It looks at the results of the system and is external. Logging allows troubleshooting after the fact by collecting data. Observability is more about collaboration and allowing multiple people and teams to see what is happening in a system in real time. This can extend to everyone you work with and future people you’ll work with.
Is logging a cybersecurity threat?
There is a balance between not releasing sensitive logged information and allowing it to be seen in order to troubleshoot. Some information in digital rebar that is dynamically generated is typically sensitive and therefore not stored. You don’t emit sensitive information in the logs if not necessary. Within settings to save this information, there are lots of disclaimers and warnings that the information is sensitive. We address this problem by making everything very clearly labeled.
How does Infrastructure as Code help with compliance in the context of cybersecurity?
IaC allows information to be collected along multiple places throughout the pipeline, and that improves observability. Observability helps with compliance because you can make sure that in each part of the system, standard compliance data is gathered. This ensures that your whole system is compliant because you know what information is where and what security measures are needed in each location. Observability data and compliance checks go hand in hand.
What is the sales service angle to sharing code between teams?
When code can be shared between teams, it can be shared for any use case – the use of multi-site management is also applicable for multi-team collaboration. In addition to this, individual teams gain the ability to choose if changes made elsewhere are pushed to their individual site/team. Multi-team collaboration allows individual teams to have individual control, while also alerting collaborating teams of changes so that everyone can benefit.
Does collaboration occur within organizations or outside?
A lot of organizations don’t have good internal collaboration and instead are divided into teams, but there needs to be collaboration between teams. Then if you’re making a system that is reusable by different individuals and teams, it might as well include collaboration outside the organization too. The entire system is made for collaboration and readability – for anyone. You can distinguish things to be meant for internal or community collaboration, but you can also combine those spheres of collaboration. The ability to distinguish between internal collaboration versus external is beneficial because if a piece is helpful to the broader community, then share it. If not, that’s fine – just keep it within the company.
How are collaboration and Open Source related?
Peer reviews, check-ins and shared code repositories are important for Open Source. Collaboration in IaC libraries and making changes in automation systems requires that it is applicable for more than just your part of the industry. Things shared in order to collaborate on Open Source may not be applicable for everyone. RackN specifically focuses on having pieces that are commonly usable. Sometimes on Open Source, when authors share their work it could be used in competitors’ work and hurt themselves. RackN’s system of sharing smaller reusable modules helps avoid this problem by curating general community content.
Is pipeline automation with developer test production possible?
Yes. Testing is when someone can write IaC and hand it off to a test group that can tell what it is and recreate it themselves. Automation makes testing easier and in turn increases the amount of testing. At the same time, there has to be control of how far the automation reaches – control over what reaches which individual sites. Different teams need to be able to work together on these changes.
What aspect of development is often overlooked?
IaC puts a focus on automation, and there is a lot of modularity in automation. Breaking down big infrastructure pipelines into smaller pieces (workflows, stages, tasks) and it looks like a lot of moving parts, but those parts are reusable in different contexts. This reusability of modules is often overlooked. So breaking it down into smaller modular units allows building other things to be easier. This treats automation more like a development construct, using small modules to snap together to build something.