Why is RackN advancing OpenStack on Kubernetes?

Yesterday, RackN CEO, Rob Hirschfeld, described the remarkable progress in OpenStack on Kubernetes using Helm (article link).  Until now, RackN had not been willing to officially support OpenStack deployments; however, we now believe that this approach is a game changer for OpenStack operators even if they are not actively looking at Kubernetes.

We are looking for companies that want to join in this work and fast-track it into production. If this is interesting, please contact us at sre@rackn.com.

Why should you sponsor? Current OpenStack operators facing “fork-lift upgrades” should want to find a path like this one that ensures future upgrades are baked into the plan. This approach provide a fast track to a general purpose, enterprise grade, upgradable Kubernetes infrastructure.

Here is Rob’s Demo

Rob’s Original Blog Post

RackN revisits OpenStack deployments with an eye on ongoing operations. I’ve been an outspoken skeptic of a Joint OpenStack Kubernetes Environment because I felt that the technical hurdles of cloud native architecture would prove challenging.

I was wrong: I underestimated how fast these issues could be addressed.

… read the rest at Beyond Expectations: OpenStack via Kubernetes Helm (Fully Automated with Digital Rebar) — Rob Hirschfeld

Shouldn’t we have Standard Automation for Commodity Infrastructure?

sre-seriesOur focus on SRE series continues… At RackN, we see a coming infrastructure explosion in both complexity and scale. Unless our industry radically rethinks operational processes, current backlogs will escalate and stability, security and sharing will suffer.

bookAn entire chapter of the Google SRE book was dedicated to the benefits of improving data center provisioning via automation; however, the description was abstract with a focus on the importance of validation testing and self-healing. That lack of detail is not surprising: Google’s infrastructure automation is highly specialized and considered a competitive advantage.

Shouldn’t everyone be able to do this?

After all, data centers are built from the same basic components with the same protocols.

Unfortunately, the stack of small (but critical) variations between these components makes it very difficult to build a universal solution. Reasonable variations like hardware configuration, vendor out-of-band management protocol, operating system, support systems and networking topologies add up quickly. Even Google, with their tremendous SRE talent and time investments, only built a solution for their specific needs.

To handle this variation, our SRE teams bake assumptions about their infrastructure directly into their automation. That’s expedient because there’s generally little operational reward for creating generic solutions for specific problems. I see this all the time in data centers that have server naming conventions and IP address schemes that are the automation glue between their tools and processes. While this may be a practical tactic for integration, it is fragile and site specific.

Hard coding your operational environment into automation has serious downsides.

First, it creates operational debt [reference] just like hard coding values in regular development. Please don’t mistake this as a call for yak shaving provisioning scripts into open ended models! There’s a happy medium where the scripts can be robust about infrastructure like ips, NIC ordering, system names and operating system behavior without compromising readability and development time.

Second, it eliminates reuse because code that works in one place must be forked (or copied) to be used again.  Forking creates a proliferation of truth and technical debt.  Unlike a shared script, the forked scripts do not benefit from mutual improvements.  This is true for both internal use and when external communities advance.  I have seen many cases where a company’s decision to fork away from open source code to “adjust it for their needs” cause them to forever lose the benefits accrued in the upstream community.

Consequently, Ops debt is quickly created when these infrastructure specific items are coded into the scripts because you have to touch a lot of code to make small changes. You also end up with hidden dependencies

However, until recently, we have not given SRE teams an alternative to site customization.

Of course, the alternative requires some additional investment up front.  Hard coding and forking are faster out of the gate; however, the SRE mandate is to aggressively reduce ongoing maintenance tasks wherever possible.  When core automation is site customized, Ops loses the benefits of reuse both internally and externally.

That’s why we believe SRE teams work to reuse automation whenever possible.

rebar-1Digital Rebar was built from our frustration watching the OpenStack community struggle with exactly this lesson.  We felt that having a platform for sharing code was essential; however, we also observed that differences between sites made it impossible to share code.  Our solution was to isolate those changes into composable units.  That isolation allowed us take a system integration view that did not break when inevitable changes were introduced.

If you are interested in breaking out of the script customization death spiral then review what the RackN team has done with Digital Rebar.

Even if you don’t use the code, the approach could save your SRE team a lot of heartburn down the road.  Of course, if you do want to use it then just contact us at sre@rackn.com.

Six Perils of DIY Provisioning

This was posted by RackN CEO, Rob Hirschfeld, about his adventures in the field talking with real operators….

Today, I’m sharing a parable about always being focused on adding value. Recently, I was on a call with an IT Ops manager who insisted that his team had their on-premises operations under control with “python scripts and manual kickstart files” because they “really don’t change their infrastructure setup.” He explained that he and his team […]

via Apparently IT death smells like kickstart files. Six Reasons why. — Rob Hirschfeld

sre-seriesFor more about these topics, check out our SRE Series.

The Danger of SRE Backlogs

This post is part of an SRE series grounded in the ideas inspired by the Google SRE book. Every Ops team I know is underwater and doesn’t have the time to catch their breath. Why does the load increase and leave Ops behind? It’s because IT is increasingly fragmented and siloed by both new tech and […]

via Spiraling Ops Debt & the SRE coding imperative — Rob Hirschfeld

Surgical Ansible & Script Injections before, during or after deployment

RackN CEO, Rob Hirschfeld, has been posting about our unique composable operations approach with Digital Rebar to enable hybrid infrastructure and mix-and-match underlay tooling.

This post shows some remarkable flexibility enabled by the approach that allow operators to take limited, secure operations against running systems.

via Surgical Ansible & Script Injections before, during or after deployment. — Rob Hirschfeld

 

Digital Rebar Training Videos

We’re excited to announce an updated set of Digital Rebar training videos.  In response to requests to go beyond the simple Quick Start guide, we created a dedicated training channel and have been producing 15 minute tutorials on a wide range of topics.

Want us to cover a topic?  Just ask us on Gitter!

 

In some cases, these videos contain information that has not made it into documentation yet.  Our documentation is open source, we’d love to incorporate your notes to help make the experience easier for the next user.

Thanks!

The rise of Site Reliability Engineers (SRE) — Rob Hirschfeld

Using infrastructure effectively is a competitive advantage for Google and their SREs carry tremendous authority and respect for executing on that mission.

I’ve been writing about Site Reliability Engineering (SRE) tasks for nearly 5 years under a lot of different names such as DevOps, Ready State, Open Operations and Underlay Operations. SRE is a term popularized by Google (there’s a book!) for the operators who build and automate their infrastructure. Their role is not administration, it is redefining how infrastructure is used and managed within Google.

SRE is about operational excellence and we keep up with the increasingly rapid pace of IT.  It’s a recognition that we cannot scale people quickly as we add infrastructure.  And, critically, it is not infrastructure specific.

via Evolution or Rebellion? The rise of Site Reliability Engineers (SRE) — Rob Hirschfeld

Why RackN Is Joining Infrastructure Masons

A few months ago, Dean Nelson, who for many years ran the eBay data center strategy shared his vision of Infrastructure Masons ( http://www.infrastructuremasons.org ) with me and asked us if we were interested in participating. He invited us to attend the very first Infrastructure Masons Leadership Summit which is being held November 16th at Google HQ in Sunnyvale, CA. We were honored and are looking forward to it.

In short, the Infrastructure Masons organization is comprised of technologists, executives and partners entrusted with building and managing the physical and logical structures of the Digital Age. They are dedicated to the advancement of the industry, development of their fellow masons, and empowering business and personal use of infrastructure to better the economy, the environment, and society.membership-symbol

During our conversation, like water, electricity, or transportation, Dean explained his belief is the majority of internet users expect instant connectivity to their online services without much awareness of the complexity and scale of the physical infrastructure that makes those services possible (I try to explain this to my children and they don’t care) and is taken for granted. Dean wants people and organizations that enable connectivity to receive more recognition for their contributions, share their ideas, collaborate and advance infrastructure technology to the next level. With leaders from FaceBook, Google and Microsoft (RackN too!) participating, he has the right players at the table who are committed to help deliver on his vision.

Managing multiple clouds, data centers, services, vendors and operational models should not be an impediment to progress for CIOs, CTO’s, cloud operators and IT administrators but an advantage. The overwhelming complexity in melding networking, containers and security should be simple and necessary. In the software-defined infrastructure, utility-based computing needs to be just like turning on a light or running a water faucet. RackN believes the ongoing lifecycle of automating, provisioning and managing hybrid clouds and data center infrastructure together under one operational control plane is possible. At RackN, we work hard to make that vision a reality. 

Looking forward into the future, we share Dean’s vision and look forward to helping drive the Infrastructure Masons mission.

Want to hear more?  Read an open ops take by RackN CEO, Rob Hirschfeld.

Author: Dan Choquette, Co-Founder/COO of RackN

Cloud Migrations & What We Can All Learn From NBA Legend Allen Iverson

 

new aaNBA Hall-Of-Famer and 2001 league MVP Allen Iverson averaged more than 26 points a game during his career. He is recognized as one of the greatest to play his position. He is also well-known for his, “We’re Talkin’ ‘Bout Practice” rant (it’s on YouTube. Pure comedy). For multiple reasons, Iverson found little value practicing with his teammates as he felt he was “The Answer” (his actual nickname) to the team’s championship hopes. Even though it would have made the team more well-rounded, he didn’t feel the need to offload some of his offensive responsibilities to other players. Needless to say, Iverson never won an NBA title during his 17 year career.

I am seeing similar parallels with organizations which have “lifted and shifted” their legacy workloads to the cloud. While many have hit the traditional Day 1 problems such as buggy containers and the normal networking, hypervisor compatibility, security, compliance and performance issues, many are also experiencing buyer’s remorse once they get to the cloud as they have not fully embraced the risk of not making their workload stack portable from AWS to GCE or even back to private IaaS and have also overlooked the role of the existing staff of IT engineers in this journey.

In my opinion, cloud portability is a nebulous term. Products such as RightScale, Scalr, Morpheus and other cloud managers are excellent at moving workloads from one cloud to another but are actually only doing half the job. When advanced technologies such as SDN, container orchestration technologies, microservices and the constant churn of configuration updates and changes which impact the entire stack, it presents a lifecycle management nightmare.  Manually updating cookbooks, manifests and automation runbooks and composing these operations in a monolithic format is an arduous chore and a huge time sink. Additionally, if the workload has been refactored for only AWS/Cloud Formation and a couple of months later consumption costs become unbearable, the customer is locked-in and any OPEX advantage hoped to be gained has evaporated.  maxresdefault

We are also seeing the traditional IT engineer squeezed out of the cloud movement- and they shouldn’t. While it is sensible for data centers to be consolidated (or shut down entirely) over a 3-5 year period, security, data locality, compliance and licencing are all major factors considering keeping some on-prem IaaS and physical gear. In the event in which a workload needs to be moved back to a VM cluster, private cloud or on bare metal, who is there to ensure that these important functions are addressed? While they are not coders or full-stack DevOps engineers by trade,  with collaboration, intelligent automation tools which make the right provisioning and configuration decisions that follow a unified CloudOps model,  IT engineers help DevOps teams focus on getting the cloud to where it needs to be and perceived CAPEX and OPEX benefits realized.

At RackN, we automatically abstract away the unknown complexities of cloud migration-to-production use cases and allow CIOs to continue to innovate, modernize and make portable their workloads and operational models. We believe the cloud and infrastructure underlay pertaining to platform portability and the traditional IT engineer are critical teammates needed to be part of a winning strategy even if Allen Iverson doesn’t think so.

 

About the Author:

Dan Choquette is Co-Founder/COO of RackN. With Allen Iverson as inspiration, Dan will continue to work on his jump shot (which is an effort in futility).

 

Bugs Bunny, Prince and Enabling True Hybrid Infrastructure Consumption

OK- Stay with me on this. I’m drawing parallels again.  🙂

Like many from my generation, my initial exposure to classical music and opera was derived from Bugs Bunny on Saturday mornings (culturally deprived, I know). One of the cartoons I remember well was with Bugs trying to get even with the heavy-set opera singer who disrupts Bugs’ banjo playing. In order to exact his revenge, Bugs infiltrates the opera singer’s concert by impersonating the famous long-hared (hared…get it?) conductor, Leopold Stokowski. He proceeds to force the tenor to hit octaves that structurally compromise the amphitheater and as it crumbles leaves him bruised and battered. Bugs is as always, victorious.

bugs

In examining Bugs’ strategy (let’s assume he actually had one), Bugs took over operations of the orchestra’s musical program to achieve his goal of getting the tenor “in-line” so to speak. As I prepare to head down to the OpenStack Conference in Austin, TX next week, I’m seeing similar patterns develop in the cloud and data center infrastructure space which are very “Bugs/Leopold-like”. With organizations deciding on how to consolidate data centers, containerize apps and move to the cloud, vendors and open source technologies offer value, however true operational, infrastructure and platform independence are not what they appear to be. For example, once you move your apps off the data center to AWS or VMware and then later determine you are paying too much or the workload is no longer is appropriate for the infrastructure, good luck replicating the configuration work done on CloudFormation on another cloud or back in the data center. Same rationale is applicable to other technologies such as converged infrastructure and proprietary private cloud platforms. As the customer, to achieve scale and remove operational pain you must fall in line. That in itself is a big commitment to make in a still-evolving and maturing technology industry and a dynamic business climate.

On an unrelated topic, I was saddened to learn of the passing of Prince this past week. While not a die-hard fan, I liked his music. He was a great composer of songs and had a style all to his own. Beyond his music and sheer talent, I admired his business beliefs and deep desire to maintain creative ownership and control of his music and his brand.

princeDespite his fortune and fame, there was a period in the middle of Prince’s career in which he felt creatively and financially locked-in by the big record companies. Once Prince (and the unpronounceable symbol) broke away from Warner Music, he was able to produce music under his own label. This action enabled him to create music without a major record label dictating when he needed to produce a new album and what it needed to sound like. In addition, he was now able to market his new recordings to the distribution platform that supported his artistic and financial goals. While still having ties to Warner Music, he was no longer bound by their business practices. Along with starting his own music subscription service, Prince cut deals with Arista, Columbia, iTunes and Sony. Prince’s music production had operational portability, business agility and choice (seven Grammy awards and 100 million record sales also help create that kind of leverage.).

While open APIs and containers offer some portability, at RackN we believe they do not offer a completely free market experience to the cloud and infrastructure consumer. If the business decides it is paying too much for AWS, it should not allow for the operational underlay and configuration complexity to lock them to the infrastructure provider. They should be able to transfer their business to Google, Azure, Rackspace or Dreamhost with ease. We believe technologies that create portable, composable operational workflows drive true infrastructure and platform independence and as a benefit, reduces business risk. Choosing a platform and being forced to use it are two very different things.

In conclusion, when considering moving workloads to the cloud, converged infrastructure platforms or using DevOps automation tools, consider how you can achieve programmable operational portability and agility. Think about how you can best absorb new technologies without causing operational disruption in your infrastructure. Furthermore, ensure you can accomplish this in a repeatable, automated fashion. Analyze how you can abstract away complex configurations for security, networking and container orchestration technologies and make them adaptable from one infrastructure platform to another. Attempt to eliminate configuration versioning as much as possible and make upgrades simplistic and automated so your DevOps staff does not have to be experts (they are stressed out enough.).

If you are attending the OpenStack Conference this week, look me up. While I am far from a music expert, i’ll be happy to share with you my insights on how to spot a technology vendor that likes to play a purple guitar as opposed to one that eats carrots and plays the banjo.

-Dan Choquette: Co-Founder, RackN

 

 

 

%d bloggers like this: