Micro-services: Knock, Knock, Knockin’ on DevNetOps’ Door by James Kelly


A version of this article was published on October 13, 2017 at TheNewStack

“You’ve got to ask yourself one question: ‘Do I feel lucky?’ Well, do ya, punk?” – Dirty Harry

Imagine that putting this famous question to the sentiment of IT deployments, it points to IT and even business performance with scary accuracy. High performers – “The most powerful guns in the world,” to borrow Harry’s words – pull the trigger on deployments with high confidence, while deployment dread is a surefire sign of lower performers, advises the State of DevOps report.

In his talks, Gene Kim, author of The DevOps Handbook, corroborates that deployment anxiety is associated with hapless businesses half as likely to exceed profitability, productivity and market share goals, and evidently with lower market-cap growth.

We may conclude high-performing teams wear the badge of confidence because they’re among the ranks in the academy of agile, deploying more often – orders of magnitude more often. Their speed is in taking lots of little steps, so they have regular experience with change.

And feeling luckier is also an effect of actually being luckier. Data show high performers break things less often; and when they do, there are more clear-cut forensics. They bring in better MTTF and MTTR than lower performers’ old-fashioned police work because the investigation and patching proceeds quickly from the last small step, instead of digging for clues in bigger deliveries peppered with many modifications.

Less, more often, is better than more, less often

Imagine conducting IT on two technology axes: time and space. If it’s clearly healthier to automate faster, smaller steps with respect to the timeline or pipeline, then consider space or architecture. Optimizing architecture design and orchestration to recruit nimble pipeline outputs, what stands out in today’s line up of characters? Affirmative, ace: micro-services.

The principles of DevOps have been around a while and in emerging practice for more than a decade, but the pivotal technologies that cracked DevOps wide open were containers and micro-services orchestration systems like Kubernetes. Looking back, it’s not so surprising that smaller boundaries and enforced packaging from developers, preserved through the continuous integration and delivery pipeline, make more reliable cases for deployment.

A micro-services architecture isn’t foolproof, but it’s the best partner today for the speed and agility of frequent or continuous deployment.

Micro-services networks and networking as micro-services

In a technical trial of service meshes versus SDN, there are three key positions networking takes in today’s micro-services scene:

  1. In a micro-services design, the pieces become smaller and the intercellular space – the network – gets bigger, busier, and hence, vital. Also, beyond zero-trust-style protection of the micro-services themselves, it’s important to have this network locked down.
  2. Service discovery, service/API gateways, service advertising with DNS, and service scale-out or -in with load balancing are all players in networking’s jurisdiction.
  3. Beyond micro-services, any state replication, backup, or analytics over an API, a volume, or a disk, also rides on the network.

Given the importance of networking to the success of micro-services, it’s ironic that networking components are mostly monolithic. Worse, deployment anxiety is epidemic: network operators have lengthy change controls, infrequent maintenance windows, and new code versions are held for questioning for 6-18 months and several revisions after availability.

A primal piece on DevNetOps cites five things we can borrow from the department of DevOps to remedy network ops in time and space, starting with code, pipelines and architecture.

Small steps for DevNetOps

Starting into DevNetOps is possible today with Spinnaker-esque orchestration of operational stages: a network-as-code model and repository would feed into a CICD pipeline for all configuration, template, code and software-image artifacts. With new processes and skills training of networking teams – like coding and reviewing logistics as well as testing and staging simulation – we could foil small-time CLI-push-to-production joyrides and rehabilitate seriously automation-addled “masterminds” who might playbook-push-to-production bigger mistakes.

The path to corrections and confidence on the technology time axis, begins with automating the ops timeline as a pipeline with steps of micro modifications.

Old indicted ops practices can be reinvented by the user community with vendor and open-source help for tooling, but when it comes to architecture, the vendors need to lead. Vendors are the chief “Dev” partner in the DevNetOps force, and motives are clearer than ever to build a case to pursue micro-services.

Small pieces for DevNetOps: Micro-services

After years of bigger badder network devices – producing some monolithic proportions so colossal they don’t fit through doors – vendors can’t ignore the flashing lights and sirens of cloud, containers and micro-services.

It’s clear for DevNetOps, like for DevOps, micro-sized artifacts are perfectly sized bullets for the chamber of an agile pipeline. But while there’s evidence of progress in the networking industry, there’s a ways to go.

Some good leads toward a solution include Arista supporting patch packages separate from their main EOS delivery; the OCP popularizing software and hardware disaggregation in its networking project; and Juniper Networks building on disaggregation by supporting node splicing and universal chassis for finer-grained modularity and management boundaries. Furthermore, data center network designs of resilient scale-out Clos network fabrics with pizza-box-sized devices are gradually favored over large aggregation devices. And in software-defined networking, projects like OpenContrail are now dispatched as containers.

In the world of DevOps, we know that nothing does wonders for deployment quality like developers threatened with the prospect of a page at 2am. But for DevNetOps that poetic justice is missing, and the 24/7 support between the vendor-customer wall hardly subdues operator angst when committing a change to roll a deployment. Moreover, the longer the time between a flawed vendor code change and the time it’s caught, the more muddled it gets and the tougher it is to pin.

The best line of defense against these challenges is smaller, more-frequent vendor deliveries, user tests and deployments. Drawing inspiration from the success of how DevOps was bolstered by micro-services, imagine if while we salute DevNetOps continuous and agile operations today, we compel vendors and architectural commissioners to uphold designs for finer-grained felicitous micro-services, devices and networks for a luckier tomorrow.

A version of this article was published on October 13, 2017 at TheNewStack


For more information on defining DevNetOps and DecSecOps, see this article and my short slideshare:

New Heroes in the DevOps Saga: DevSecOps and DevNetOps by James Kelly


This article was originally published on September 26 at

The evolution of DevOps is by no means done, but it’s safe to say that there is enough agreement and acceptance to declare it a hero. DevOps has helped glorify IT to the point where it’s no longer preventing business, nor a provider nor a partner of the business.

Often IT is the business, or its vanguard for competitive disruption and differentiation.

Splintering the success of this portmanteau hero, we now hear more and more of two trusty sidekicks: DevSecOps and DevNetOps. Lesser understood in their adolescence, these tots are still frequently misunderstood, are still forming their identities, and still need a lot of development if they’re to enter the IT hall of fame like their forerunner.

Just as the terms look, DevSecOps and DevNetOps are often assumed to be about wrapping DevOps principles around security and networking: operators hope to assuage technical debt and drudgery by automating in proficiency and resiliency. For networking, I’ve covered how there is a lot more to that than coding, but to be sure, these sidekicks certainly espouse operators learning how to do develop while DevOps was equally, if not more, about developers learning to operate.

The Shift Left: SecDevOps and NetDevOps

As if it wasn’t hard enough to tell what DevSecOps and DevNetOps want to be when they grow up, we’ve gone and given them alter egos: SecDevOps (aka “rugged” DevOps) and NetDevOps. Think about them exactly as the words look – it’s about the shift to the left. Left of what?

Traditional DevOps practices focus on business-specific applications development. The development timeline is known as concept to cash, and with all the superpowers of DevOps we try to reduce our enemy: the lead time and repeatable processes between code and cash.

Security and building infrastructure – like networks – were supporting tasks, not revenue-generating nor competitive advantages. Thus, security and networking were far to the right on the timeline with concerns that deal with operational scale, performance and protection.

Today’s shift left propels security and infrastructure considerations earlier on the timeline, into coding, architecture and pre-production systems. It’s a palpable penny-drop amid daily news of security breaches and infrastructure outages causing technology-defined establishments to bleed money and brand equity.

Fill the bucket with cash, but don’t forget to forestall the leaks!

DevOps and Infrastructure: Challenge and Opportunity

Automation sparks have flown over the proverbial wall into the camp of I&O pros. Operators trading physical for virtual, macro for micro, converged for composed, and configuration for code is proof that the fire has caught security and networking. Controlling the burn now, is key, so that healthier skills and structures arise in place of the I&O dogma and duff. Fortunately, this is precisely the destiny for our newfound heroes, DevSecOps and DevNetOps.

However, doing DevSecOps and DevNetOps, embracing security and networks as code, we mustn’t be so credulous as to forget the formidable DevOps practices and patterns that need transforming along the ultimate automation journey. Testability, immutability, upgradability, traceability, auditability, reliability, and other __abilities are not straightforward to achieve.

Discounting “aaS” technology consumed as a service, a fundamental challenge to innovating SecOps and NetOps, compared to application ops, is that applications are crafted and built; security and networking solutions are mostly still bought and assembled.

Security and network infrastructure as code is something that needs to be co-created with the vendors. Other than in the cloud, it will take a while before security and networking systems are driven API-first, and are redesigned and broken down to offer simulation, composition and orchestration with scale and resilience.

While this will land first in software-defined infrastructure, there is still a ways to go to manage most software-defined security and networking systems with continuous practices of artifact integration, testing, and deployment. Hardware and embedded software will be even more challenging.

Finding Strength in Challenge

So on one hand, DevOps is evolving with security and networking shifting left. On the other hand, traditional security and networking ops are transforming with DevOps principles.

Is the ultimate innovation to squeeze out those traditional operations altogether? Does NetDevOps + DevNetOps = DevOps?

There is a parallel train of thought and debate, with success on both sides. Purist teams cut out operations with the “you build it, you run it” attitude. Other companies like Google have dedicated operations specialist teams of SREs. While the SRE reporting structure is isolated, SRE jobs are very integrated with that of development teams. It’s easy to imagine the purist approach, subsuming security and networking into DevOps practices, but only if we assume the presence of cloud infrastructure and services as a platform. Even then, there is still substantiation for the SRE.

Layers below, however, somebody still needs to build the foundations of the cloud IaaS and data center hardware. As they say, “Even serverless computing, is not actually serverless.”

Underpinning the clouds are data centers. And then there’s transport, IoT, mobile or other secure networks to and between clouds. In these areas, it’s obvious there is a niche for our two trusty sidekicks, DevSecOps and DevNetOps, to shake up ops culture and principles. These two heroes can rescue software-defined and physical infrastructure from the clutches of so many anti-pattern evils, like maintenance windows and change controls (ahem, it’s called a “commit”).

We may not require rapid experimentation in our infrastructure, but we would warmly welcome automated deployments, automated updates, failure and attack testing drills, and intent-driven continuous response. They will boost resiliency and optimization for the business and peace of mind for the builders.

Teams operating security, networks, and especially clouds, need to honor and elevate DevSecOps and DevNetOps, so that on the journey now afoot, our teams and our new heroes may realize their potential.

This article was originally published on September 26 at

It’s the End of Network Automation as We Know It (and I Feel Fine) by James Kelly

This article was originally posted on July 5th on The New Stack at

Network automation does not an automated network make. Today’s network engineers are frequently guilty of two indulgences. First, random acts of automation hacking. Second, pursuing aspirational visions of networking grandeur — complete with their literary adornments like “self-driving” and “intent-driven” — without a plan or a healthy automation practice to take them there.

Can a Middle Way be found, enabling engineers to set achievable goals, while attaining the broader vision of automated networks as code? Taking some inspiration from our software engineering brethren doing DevOps, I believe so.

How Not to Automate

There’s a phrase going around in our business: “To err is human; to propagate errors massively at scale is automation.”

Automation is a great way for any organization to speed up bad practices and #fail bigger. Unfortunately, when your business is network ops, the desire to be a cool “Ops” kid with some “Dev” chops — as opposed to just a CLI jockey — will quickly lead you down the automation road. That road might not lead you to those aspirational goals, although it certainly could expand your blast radius for failures.

Before we further contemplate self-driving, intent-driven networking, and every other phrase that’s all the rage today (although I’m just as guilty of such contemplation as anyone else at Juniper), we should take the time to define what we mean by “proper” in the phrase, “building an automated network properly.”

If you haven’t guessed already, it’s not about writing Python scripts. Programming is all well and good, but twenty minutes of design often really does save about two weeks of coding. To start hacking at a problem right away is probably the wrong approach. We need to step back from our goals, think about what gives them meaning, apply those goals to the broader picture, and plan accordingly.

To see what is possible with automation, we should look at successful patterns of automation outside of networking and the reasons behind them, so we may avoid the known bad habits and anti-patterns, and sidestep avoidable pitfalls. For well-tested patterns of automation, we needn’t look any further than the wealth of knowledge and experience in the arena of DevOps.

It matters what we call things. For better or worse, a name focuses the mind. The overall IT strategy to improve the speed, resilience, quality and intelligence of our applications is not called automation or orchestration. While ITIL volumes make their steady march into museums, the new strategy to enable business speed and smarts is incontrovertible, and it’s called DevOps.

Initially that term may invoke a blank page or even a transformation conundrum. But you can learn what it means to practice successful DevOps culture, processes, design, tooling and coding. DevOps can define your approach to the network, and is why we ought not promote network automation (which could focus the mind on the wrong objectives) and instead talk about DevNetOps as the application of patterns of DevOps applied to networking.

Networks as Code

The idea of infrastructure-as-code (IaC) has been around for a while, but surprisingly has seldom been applied to networking. Juniper Networks (where I hang my hat) and other networking vendors like Apstra have made some efforts over the years to move folks in this direction, but there is still a lot of work to do. For example, Juniper has had virtual form factors of most series of hardware systems, projects like Junosphere for network modeling in the cloud (many of us now use Ravello), and impressive presentations on IaC and professional services consulting. Juniper’s senior marketing director Mike Bushong (formerly with Plexxi) wrote about the network as code back in 2014.

IaC is generally well applied to cloud infrastructure, but it’s way harder to apply to bare metal. For evidence of this, just look at Triple-O, Kubernetes on Kubernetes, or Kubernetes on OpenStack on Kubernetes! That bottom, metal layer is quite the predicament.

To me, this means in networking, we should be easily applying IaC to software defined networking (SDN). But can we apply it to our network devices and manage the physical network? I asked Siri, and she said it was a trick question. As an armchair architect myself, I don’t have all the answers.  But as I see it, here are some under-considered aspects for designing networks as code with DevNetOps:

1. Tooling

In tech, everyone loves shiny objects, so let’s start there. Few network operators — even those who have learned some programming — are knowledgeable about the ecosystem of DevOps tooling, and few consider applying those same tools to networking. Learning Python and Ansible is just scratching the surface. There is a vast swath of DevOps tools for CI/CDsite reliability engineering (SRE), and at-scale cloud-native ops.

2. Chain the tool chain: a pipeline as code

When we approach the network as code, we need to consider network elements and their configurations as building blocks created in a code-development pipeline of dev/test/staging/production phases. Stringing together this pipeline shouldn’t be a manual process; it should be mostly coded to be automatic.

As with software engineering, there are hardware and foundational software elements with network engineering, such as operating systems that the operator will not create themselves, but rather just configure and extend. These configurations and extensions, with their underlying dependencies, can be built together, versioned, tested, and delivered. Thinking about the network as an exercise in development, automation should start in the development infrastructure itself.

3. Immutable infrastructure

Virtualization and especially containers have made the concept of baking images very accessible, and immutable infrastructure popular. While there is still much work to do with network software disaggregation, containerization, and decoupling of services, there are many benefits of adopting immutable infrastructure that are equally applicable to networking. Today’s network devices are poster children for config drift, but to call them “snowflakes” would be an insult to actual snowflakes.

Applying principles of immutable infrastructure, I imagine a network where each device PXE-boots into a minimal OS and runs signed micro-service building blocks. Each device has declarative configs, decoupled secret management and rotation, and logging and other monitoring data with good overall audit ability and traceability — all of which is geared to take the network off the box ASAP.

Interestingly, practices such as SSH’ing into boxes would be rendered impossible, and practices that “savvy” network automators do today like running Ansible playbooks against an inventory of devices would be banished.

4. Upgrades

Upgrades to network software and even firmware/microcode on devices could be managed automatically, by means of canary tests and rolling upgrade patterns. To do this on a per-box or per-port basis, or at finer levels of flows or traffic-processing components, we need to be able to orchestrate traffic balancing and draining.

If that sounds complex, we can make things simpler. We could treat devices and their traffic like cattle instead of pets, and rely on their resilience. Killing and resurrecting a component would restart it with a new version. While this is suitable for some software applications, treating traffic as disposable is not yet desirable for all network applications and SLAs. Still, it would go a long way toward properly designing for resilience.

5. Resilience

One implication of all this is the presence of redundancy in the network paths.  As with any networking component, that’s very important for resilience. Drawing inspiration from scale-out architectures in DevOps and the microservices application model, redundancy and scale would go hand-in-hand by means of instance replication. So redundancy would neither be 1:1 nor active-passive. We should always be skeptical of architectures that include those anti-patterns.

Good design would tolerate a veritable networking chaos monkey. Burning down network software would circuit-break to limit failures. Killing links and even boxes, we would quickly re-converge as we often do today, but dead boxes, dead SDN functions or dead virtual network functions would act like phoenix servers, rising back up or staying down in case of repeated failures or detected hardware failures.

The pattern for preventing black-swan event failures is to practice forcing these failures, and thus practice automating around them, so that the connectivity or other network service and its performance is tolerant and acceptably resilient on its own SLA measuring stick, whatever the meta-service in question may be.

Doing DevNetOps

In each one of these above topics lies much more complexity than I will dive head-first into here. By introducing them here, my aim has been to demonstrate there are interesting patterns we may draw from, and some operators are doing so already. If you’ve ever heard the old Zen Buddhist koan of the sound of one hand clapping, that’s the sound you’re likely to hear from your own forehead, once the obviousness of applying DevOps to DevNetOps hits you squarely in the face.

Just as the hardest part of adopting DevOps is often cited to be breaking off one manageable goal at a time and focusing on that, I think we’ll find the same is true of DevNetOps. Before we even get there in networking, I think we need to scope the transformation properly of applying DevNetOps to the challenges and design of networking, especially with issues of basic physical connectivity and transport.

While “network automation” leads the mind to jump to things like applying configuration management tooling and programming today’s manual tasks, DevNetOps should remind us that there is a larger scope than mere automation coding.  That scope includes culture, processes, design and tools. Collectively, they may all lead us to a happier place.

This article was originally posted on July 5th on The New Stack at

Title image of a Bell System telephone switchboard, circa 1943, from the U.S. National Archives.