Is Hybrid DevOps Like The Tokyo Metro?

By Dan Choquette

Is DevOps at scale like a major city’s subway system? Both require strict processes and operational excellence to move a lot of different parts at once. How else?

If you had the pleasure of riding the Tokyo Metro, you might agree that it’s an interesting – and confusing experience (especially if you need to change lines!) All totaled, there are 9 lines, roughly 180+ stations with a daily ridership of almost 7 million people!

tokyo subway

A few days ago, I had a conversation with a potential user deploying Kubernetes with Contrail networking on Google Cloud repeatedly in a build/test/dev scenario. The other conversation was around the need to provision thousands of x86 bare metal servers once to twice a week with different configurations and networking with the need to ultimately control their metal as they would a cloud instance in AWS. Cool stuff!

Since we here at RackN believe Hybrid DevOps is a MUST for Hybrid IT (after all, we are a start-up and have bet our very lives on such a thing so we REALLY believe it!) I thought about how Hybrid DevOps compares to the Tokyo Metro (earlier that day I read about Tokyo in the news and the mind wandered). In my attempt to draw the parallel, below is an SDLC DevOps framework that you have seen 233 times before in blogs like this one.

devops

In terms of process, I’m sure you can notice how similar it is to the Metro, right?

<crickets>

<more crickets>

When both operate as they should, they are the epitome of automation, control, repeat-ability and reliability. In disciplined, automated at-scale DevOps environments it does have some similarity to the Ginza or Tozai line. You have different people (think apps) of all walks of life boarding a train needing to get somewhere and need to follow steps in a process (maybe the “Pusher” is the scrum or DevOps governance tool but we’ll leave that determination for the end). However, as I compare it to Hybrid DevOps, the Tokyo Metro is not hybrid-tolerant. With subways, if a new subway car is added, tracks are changed, or a new station is added instantaneously to better handle congestion everything stops or turns into a logistical disaster. In addition, there is no way of testing how it will all flow before hand. There will be operational glitches and millions of angry customers will not reach their destination in a timely fashion- or at all.

The same is metaphorically true for Hybrid DevOps in Hybrid IT. In theory, the Hybrid DevOps pipeline includes build/test/dev and continuous integration/deployment for all platforms, business models, governance models, security and software stacks in which are dependent with the physical/IaaS/container underlay. Developers and operators need to test against multiple platforms (cloud, VM and metal) and in order to realize value, assimilate into production rapidly while at the same time frequently adjusting to changes of all kinds. They also require the ability to layer multiple technologies and security policies into an operational pipeline which in turn has hundreds of moving parts which require precise configuration settings to be made in a sequenced, orchestrated manner.

At RackN, in order to continuously test, integrate, deploy and manage complex technologies in a Hybrid IT scenario is critical to a successful adoption in production. The most optimal way to accomplish that is to have in place a central platform than can govern Hybrid DevOps at scale that can automate, orchestrate and compose allthe necessary configurations and components in a sequenced fashion. Without one, hap-hazard assembly and lack of governance erodes the overall process and leads to failure. Just like the “Pusher” on the platform, without governance both the Tokyo Metro and a Hybrid DevOps model at scale being used for a Hybrid IT use case leads to massive delays, dissatisfied customers and chaos.

pusher

 

 

Kubernetes 18+ ways – yes, you can have it your way

By Rob Hirschfeld

Lately, I’ve been talking about the general concept of hybrid DevOps adding composability, orchestration and services to traditional configuration. It’s time add a concrete example because the RackN team is deliving it with Digital Rebar and Kubernetes.

So far, we enabled a single open platform to install over 18 different configurations of Kubernetes simply by changing command line flags [videos below].

By taking advantage of the Digital Rebar underlay abstractions and orchestration, we are able to use open community installation playbooks for a wide range of configurations.

So far, we’re testing against:

  • Three different clouds (AWS, Google and Packet) not including the option of using bare metal.
  • Two different operating systems (Ubuntu and Centos)
  • Three different software defined networking systems (Flannel, Calico and OpenContrail)

Those 18 are just the tip of the iceberg that we are actively testing. The actual matrix is much deeper.

BUT THAT’S AN EXPLODING TEST MATRIX!?! No. It’s not.

The composable architecture of Digital Rebar means that all of these variations are isolated. We are not creating 18 distinct variations; instead, the system chains options together and abstracts the differences between steps.

That means that we could add different logging options, test sequences or configuration choices into the deployment with minimal coupling of previous steps. This enables operator choice and vendor injection in a way to allows collaboration around common components. By design, we’ve eliminated fragile installation monoliths.

All it takes is a Packet, AWS or Google account to try this out for yourself!

DevOps workers, you mother was right: always bring a clean Underlay.

Why did your mom care about underwear? She wanted you to have good hygiene. What is good Ops hygiene? It’s not as simple as keeping up with the laundry, but the idea is similar. It means that we’re not going to get surprised by something in our environment that we’d taken for granted. It means that we have a fundamental level of control to keep clean. Let’s explore this in context.

l_1600_1200_9847591C-0837-4A7D-A69D-54041685E1C6.jpegI’ve struggled with the term “underlay” for infrastructure of a long time. At RackN, we generally prefer the term “ready state” to describe getting systems prepared for install; however, underlay fits very well when we consider it as the foundation for a more building up a platform like Kubernetes, Docker Swarm, Ceph and OpenStack. Even more than single operator applications, these community built platforms require carefully tuned and configured environments. In my experience, getting the underlay right dramatically reduces installation challenges of the platform.

What goes into a clean underlay? All your infrastructure and most of your configuration.

Just buying servers (or cloud instances) does not make a platform. Cloud underlay is nearly as complex, but let’s assume metal here. To turn nodes into a cluster, you need setup their RAID and BIOS. Generally, you’ll also need to configure out-of-band management IPs and security. Those RAID and BIOS settings specific to the function of each node, so you’d better get that right. Then install the operating system. That will need access keys, IP addresses, names, NTP, DNS and proxy configuration just as a start. Before you connect to the wide, make sure to update to your a local mirror and site specific requirements. Installing Docker or a SDN layer? You may have to patch your kernel. It’s already overwhelming and we have not even gotten to the platform specific details!

Buried in this long sequence of configurations are critical details about your network, storage and environment.

Any mistake here and your install goes off the rails. Imagine that your building a house: it’s very expensive to change the plumbing lines once the foundation is poured. Thankfully, software configuration is not concrete but the costs of dealing with bad setup is just as frustrating.

The underlay is the foundation of your install. It needs to be automated and robust.

The challenge compounds once an installation is already in progress because adding the application changes the underlay. When (not if) you make a deploy mistake, you’ll have to either reset the environment or make your deployment idempotent (meaning, able to run the same script multiple times safely). Really, you need to do both.

Why do you need both fast resets and component idempotency? They each help you troubleshoot issues but in different ways. Fast resets ensure that you understand the environment your application requires. Post install tweaks can mask systemic problems that will only be exposed under load. Idempotent action allows you to quickly iterate over individual steps to optimize and isolate components. Together they create resilient automation and good hygiene.

In my experience, the best deployments involved a non-recoverable/destructive performance test followed by a completely fresh install to reset the environment. The Ops equivalent of a full dress rehearsal to flush out issues. I’ve seen similar concepts promoted around the Netflix Chaos Monkey pattern.

If your deployment is too fragile to risk breaking in development and test then you’re signing up for an on-going life of fire fighting. In that case, you’ll definitely need all the “clean underwear” you can find.

We need DevOps without Borders! Is that “Hybrid DevOps?”

The RackN team has been working on making DevOps more portable for over five years.  Portable between vendors, sites, tools and operating systems means that our automation needs be to hybrid in multiple dimensions by design.

Why drive for hybrid?  It’s about giving users control.

launch!I believe that application should drive the infrastructure, not the reverse.  I’ve heard may times that the “infrastructure should be invisible to the user.”  Unfortunately, lack of abstraction and composibility make it difficult to code across platforms.  I like the term “fidelity gap” to describe the cost of these differences.

What keeps DevOps from going hybrid?  Shortcuts related to platform entangled configuration management.

Everyone wants to get stuff done quickly; however, we make the same hard-coded ops choices over and over again.  Big bang configuration automation that embeds sequence assumptions into the script is not just technical debt, it’s fragile and difficult to upgrade or maintain.  The problem is not configuration management (that’s a critical component!), it’s the lack of system level tooling that forces us to overload the configuration tools.

What is system level tooling?  It’s integrating automation that expands beyond configuration into managing sequence (aka orchestration), service orientation, script modularity (aka composibility) and multi-platform abstraction (aka hybrid).

My ops automation experience says that these four factors must be solved together because they are interconnected.

What would a platform that embraced all these ideas look like?  Here is what we’ve been working towards with Digital Rebar at RackN:

Mono-Infrastructure IT “Hybrid DevOps”
Locked into a single platform Portable between sites and infrastructures with layered ops abstractions.
Limited interop between tools Adaptive to mix and match best-for-job tools.  Use the right scripting for the job at hand and never force migrate working automation.
Ad hoc security based on site specifics Secure using repeatable automated processes.  We fail at security when things get too complex change and adapt.
Difficult to reuse ops tools Composable Modules enable Ops Pipelines.  We have to be able to interchange parts of our deployments for collaboration and upgrades.
Fragile Configuration Management Service Oriented simplifies API integration.  The number of APIs and services is increasing.  Configuration management is not sufficient.
 Big bang: configure then deploy scripting Orchestrated action is critical because sequence matters.  Building a cluster requires sequential (often iterative) operations between nodes in the system.  We cannot build robust deployments without ongoing control over order of operations.

Should we call this “Hybrid Devops?”  That sounds so buzz-wordy!

I’ve come to believe that Hybrid DevOps is the right name.  More technical descriptions like “composable ops” or “service oriented devops” or “cross-platform orchestration” just don’t capture the real value.  All these names fail to capture the portability and multi-system flavor that drives the need for user control of hybrid in multiple dimensions.

Simply put, we need devops without borders!

What do you think?  Do you have a better term?

Smaller Nodes? Just the Right Size for Docker!

Container workloads have the potential to redefine how we think about scale and hosted infrastructure.

Last Fall, Ubiquity Hosting and RackN announced a 200 node Docker Swarm cluster as a phase one of our collaboration. Unlike cloud-based container workloads demonstrations, we chose to run this cluster directly on the bare metal.  

Why bare metal instead of virtualized? We believe that metal offers additional performance, availability and control.  

With the cluster automation ready, we’re looking for customers to help us prove those assumptions. While we could simply build on many VMs, our analysis is the a lot of smaller nodes will distribute work more efficiently. Since there is no virtualization overhead, lower RAM systems can still give great performance.

The collaboration with RackN allows us to offer customers a rapid, repeatable cluster capability. Their Digital Rebar automation works on a broad spectrum of infrastructure allow our users to rehearse deployments on cloud, quickly change components and iteratively tune the cluster.

We’re finding that these dedicated metal nodes have much better performance than similar VMs in AWS?  Don’t believe us – you can use Digital Rebar to spin up both and compare.   Since Digital Rebar is an open source platform, you can explore and expand on it.

The Docker Swarm deployment is just a starting point for us. We want to hear your provisioning ideas and work to turn them into reality.

2015 Container Review

It’s been a banner year for container awareness and adoption so we wanted to recap 2015.  For RackN, container acceleration is near to our heart because we both enable and use them in fundamental ways.   Look for Rob’s 2016 predictions on his blog.

The RackN team has truly deep and broad experience with containers in practical use.  In the summer, we delivered multiple container orchestration workloads including Docker Swarm, Kubernetes, Cloud Foundry, StackEngine and others.  In the fall, we refactored Digital Rebar to use Docker Compose with dramatic results.  And we’ve been using Docker since 2013 (yes, “way back”) for ops provisioning and development.

To make it easier to review that experience, we are consolidating a list of our container related posts for 2015.

General Container Commentary

RackN & Digital Rebar Related

Generating Traffic Through Search

[vc_row][vc_column][vc_column_text]Lorem ipsum dolor sit amet, nec in adipiscing purus luctus, urna pellentesque fringilla vel, non sed arcu integer, mauris ullamcorper ante ut non torquent. Justo praesent, vivamus eleifend torquent, suspendisse etiam lorem vestibulum, vestibulum in lorem nec vel, sit curabitur dui ligula vehicula quam. Nec in neque mauris, enim hac risus in lorem. Mi risus, feugiat egestas nunc vehicula vehicula libero. Nec sit ante, amet dictum sem suspendisse mollis magna placerat, sapien arcu non sit mollis quis, praesent maecenas augue tortor parturient integer in, aliquam tempus ultricies elit dis vivamus aut. Ipsum non ut egestas in, suspendisse ut. Sodales sed cras. Et consequat viverra. Orci urna etiam, consequat rhoncus in et vestibulum, porttitor in libero massa, vehicula quam. Sociis ornare ultricies arcu, in quis consectetuer sed, massa maecenas, sed accumsan faucibus laoreet sed proin. Sit vivamus amet orci ornare sit metus.[/vc_column_text][vc_empty_space height=”40px”][mkdf_blockquote text=”Consectetuer in at varius fames posuere sagittis. Enim eget est augue aspernatur venenatis vitae, purus netus libero cras lorem, praesent.”][vc_empty_space height=”25px”][vc_column_text]Risus amet purus odio id, mattis diam vestibulum dolor orci, porttitor aenean cursus phasellus sit in eu, mollis vel, mattis hac libero vestibulum. Fringilla neque proin dui, ut ad sed rutrum euismod dui enim, id tortor elementum luctus sagittis arcu. Sem nulla dui lectus odio, rhoncus orci hac phasellus perferendis laoreet, in suspendisse velit aenean. Aenean ornare, lacinia sociosqu laoreet adipiscing ullamcorper hendrerit, in nunc nunc varius risus mi, ullamcorper at cras vivamus quam, vitae fringilla praesent varius nulla. Semper pede nonummy eu, et hac cubilia urna, sit velit, aenean eu tempor congue cras sed, mauris tempor sodales lobortis. Ipsum maecenas neque ligula mi purus iaculis, id turpis nisl duis. Gravida nunc nam dui.[/vc_column_text][/vc_column][/vc_row]

From Start to Scale: learn faster with heterogenous deployments

Why mix VMs and Physical? Having a consistent deploy approach can dramatically speed learning cycles that result in better scale ops. I would never deploy production OpenStack on VMs but I strongly recommend rehearsing that deployment on VMs hundreds of times before I touch metal.

Over the last two months, the RackN team redefined “heterogeneous” infrastructure in Digital Rebar from being “just” multi-vendor hardware to include any server resource from containers and Vagrant/Virtualbox to clouds like AWS or Packet. To support this truly diverse range, there were both technical and operational challenges to overcome.

The technical challenge rises from the fundamental control differences between cloud and physical infrastructure. In cloud, infrastructure is much more prescribed – you cannot change most aspects of your system and especially not your network interfaces or IPs. To provision hardware efficiently, we had to establish control over the very things that Cloud systems manage for you. 

That management diversity exercised the full extent of the Digital Rebar “functional ops” architecture.

Over the last year, we’ve been unwinding baked-in control assumptions from earlier versions of Digital Rebar. That added flexibility allows Digital Rebar to mix control APIs for infrastructure ranging from using Cobbler to Docker, Vagrant and AWS. Since we could already cope with heterogeneous control APIs using Digital Rebar’s unique functional ops design, we retained the ability to mix and match container, virtual and physical infrastructure.

The operational challenge was more subtle. We were motivated to make this change by first hand observations of the fidelity gap. I am a strong believer that container platforms will directly target metal in the next two years. The challenge is how do we get there from our current virtualization-focused infrastructure.

It’s easy to look at the completed work as an obvious step forward. Looking over my shoulder, I know that it took years of learning and perseverance to create a platform that was flexible enough to handle both extremes of control. Even more important was understanding why it was so important for a physical scale deployment platform to provide ops fidelity for developers too.

With the infrastructure work behind us, we’re seeing Digital Rebar deliver real operational transformation. We want to help IT embrace containers and immutable infrastructure without having to discard the hard won battles installing cloud and traditional infrastructure. Most critically, we hope that you’ll join our open community and share your operational journey with us.

Faster, Simpler AND Smaller – Immutable Provisioning with Docker Compose!

Nearly 10 TIMES faster system resets – that’s the result of fully enabling an multi-container immutable deployment on Digital Rebar.

Docker ComposeI’ve been having a “containers all the way down” month since we launched Digital Rebar deployment using Docker Compose. I don’t want to imply that we rubbed Docker on the platform and magic happened. The RackN team spent nearly a year building up the Consul integration and service wrappers for our platform before we were ready to fully migrate.

During the Digital Rebar migration, we took our already service-oriented code base and broke it into microservices. Specifically, the Digital Rebar parts (the API and engine) now run in their own container and each service (DNS, DHCP, Provisioning, Logging, NTP, etc) also has a dedicated container. Likewise, supporting items like Consul and PostgreSQL are, surprise, managed in dedicated containers too. All together, that’s over nine containers and we continue to partition out services.

We use Docker Compose to coordinate the start-up and Consul to wire everything together. Both play a role, but Consul is the critical glue that allows Digital Rebar components to find each other. These were not random choices. We’ve been using a Docker package for over two years and using Consul service registration as an architectural choice for over a year.

Service registration plays a major role in the functional ops design because we’ve been wrapping datacenter services like DNS with APIs. Consul is a separation between providing and consuming the service. Our previous design required us to track the running service. This worked until customers asked for pluggable services (and every customer needs pluggable services as they scale).

Besides being a faster to reset the environment, there are several additional wins:

  1. more transparent in how it operates – it’s obvious which containers provide each service and easy to monitor them as individuals.
  2. easier to distribute services in the environment – we can find where the service runs because of the Consul registration, so we don’t have to manage it.
  3. possible to have redundant services – it’s easy to spin up new services even on the same system
  4. make services pluggable – as long as the service registers and there’s an API, we can replace the implementation.
  5. no concern about which distribution is used – all our containers are Ubuntu user space but the host can be anything.
  6. changes to components are more isolated – changing one service does not require a lot of downloading.

Docker and microservices are not magic but the benefits are real. Be prepared to make architectural investments to realize the gains.

%d bloggers like this: