RackN fills holes with Drill Release

Drill Man! by BruceLowell.com [creative commons]

Drill Man! by BruceLowell.com [creative commons]

We’re so excited about our in-process release that we’ve been relatively quiet about the last OpenCrowbar Drill release (video tour here).  That’s not a fair reflection of the level of capability and maturity reflected in the code base; yes, Drill’s purpose was to set the stage for truly ground breaking ops automation work in the next release (“Epoxy”).

So, what’s in Drill?  Scale and Containers on Metal Workloads!  [official release notes]

The primary focus for this release was proving our functional operations architectural pattern against a wide range of workloads and that is exactly what the RackN team has been doing with Ceph, Docker Swarm, Kubernetes, CloudFoundry and StackEngine workloads.

In addition to workloads, we put the platform through its paces in real ops environments at scale.  That resulted in even richer network configurations and options plus performance and tuning.  The RackN team continues to adapt the platform to match real work ops.

We believe that operations tools should adapt to their environments not vice versa.

We’ve encountered some pretty extreme quirks and our philosophy is embrace don’t force users to change tools or process necessarily.  For example, Drill automatically keeps last IPv4 octets aligned between interfaces.  Even better, we can help slipstream migrations (like IPv4 to IPv6) in place to minimize disruptions.

This is the top lesson you’ll see reflected in the Epoxy release:  RackN will keep finding ways to adapt to the ops environment.  

Deploy to Metal? No sweat with RackN new Ansible Dynamic Inventory API

The RackN team takes our already super easy Ansible integration to a new level with added SSH Key control and dynamic inventory with the recent OpenCrowbar v2.3 (Drill) release.  These two items make full metal control more accessible than ever for Ansible users.

The platform offers full key management.  You can add keys at the system, deployment (group of machines) and machine levels.  These keys can be set by the operator and can be added and removed after provisioning has been completed.  If you want to control access to groups on a servers or group of server basis, OpenCrowbar provides that control via our API, CLI and UI.

We also provide a API path for Ansible dynamic inventory.  Using the simple Python client script (reference example), you can instantly a complete upgraded node inventory of your system.  The inventory data includes items like number of disks, CPUs and amount of RAM.  If you’ve grouped machines in OpenCrowbar, those groups are passed to Ansible.  Even better, the metadata schema includes the networking configuration and machine status.

With no added configuration, you can immediately use Ansible as your multi-server CLI for ad-hoc actions and installation using playbooks.

Of course, the OpenCrowbar tools are also available if you need remote power control or want a quick re-image of the system.

RackN respects that data centers are heterogeneous.  Our vision is that your choice of hardware, operating system and network topology should not break DevOps deployments!  That’s why we work hard to provide useful abstracted information.  We want to work with you to help make sure that OpenCrowbar provides the right details to create best practice installations.

For working with bare metal, there’s no simpler way to deliver consistent repeatable results.

RackN fills holes with OpenCrowbar Drill Release

By Rob Hirschfeld

I’ve been relatively quiet about the OpenCrowbar Drill release and that’s not a fair reflection of the level of capability and maturity reflected in the code base; however, it really just sets the stage for truly ground breaking ops automation work in the next release (“Epoxy”).

So, what’s in Drill?  Scale and Containers on Metal Workloads!  https://github.com/opencrowbar/core/releases

The primary focus for this release was proving our functional operations architectural pattern against a wide range of workloads and that is exactly what the RackN team has been doing with Ceph, Docker, Kubernetes, CloudFoundry and StackEngine workloads.

In addition to workloads, we put the platform through its paces in real ops environments at scale.  That resulted in even richer network configurations and options plus performance and tuning.  The RackN team continues to adapt OpenCrowbar to match real work ops.

One critical lesson you’ll see more in the Epoxy release: OpenCrowbar and the team at RackN will keep finding ways to adapt to the ops environment.  We believe that tools should adapt to their environments: we’ve encountered some pretty extreme quirks and our philosophy is embrace don’t force change.

Defending Ops without Killing Unicorns

By Rob Hirschfeld

Cloud Ops is a brutal business: operators are expected to maintain a stable and robust operating environment while also embracing waves of disruptive changes using unproven technologies. While we want to promote these promising new technologies, the unicorns, operators still have to keep the lights on; consequently, most companies turn to outside experts or internal strike teams to get this new stuff working.

Our experience is that doing an on-site deployment by professional services (PS) is often much harder than expected. Why? Because of inherent mission conflict. The PS paratrooper team sent to accomplish the “install Foo!” mission are at odds with the operators’ maintain and defend mission. Where the short-term team is willing to blast open a wall for access, the long-term team is highly averse to collateral damage. Both teams are faced with an impossible situation.

I’ve been promoting Open Ops around a common platform (obviously, OpenCrowbar in my opinion) as a way to solve address cross-site standardization.

Why would a physical automation standard help? Generally, the pros expect to arrive with everything “ready state” including OS installed and all the networking ready. Unfortunately, there’s a significant gap between an OS installed and … installs are always a journey of discovery as the teams figure out the real requirements.

Here are some questions that we’ve put together to gauge is the installs are really going the way you think:

  • How often is the customer site ready for deployment?  If not, how long does that take to correct?
  • How far into a deployment do you get before an error in deployment is detected?  How often that error repeated across all the systems?
  • How often is an “error” actually an operational requirement at the site that cannot be changed without executive approval and weeks of work?
  • How often are issues found after deployment is started that cause a install restart?
  • Can the deployment be recreated on another site?  Can the install be recreated in a test environment for troubleshooting?
  • How often are systems hand or custom updated as part of a routine installation?
  • How often are developers needed to troubleshoot issues that end up being configuration related?
  • How often are back-doors left in place to help with troubleshooting?
  • What is the upgrade process like?  Is the state of the “as left” system sufficiently documented to plan an upgrade?
  • What happens if there’s a major OS upgrade or patch required that impacts the installed application?
  • Can changes to the site be rolled out in stages?
  • Can the upgrade be automated and rehearsed?