network automation

Software-Defined, a Decade Later by James Kelly


The first African-American president had been sworn in, the king of pop had just passed, @realDonaldTrump had taken to tweeting, and Avatar was about to smash box-office records, but what I remember about the summer of 2009 is the dawn of software-defined.

Looking forward to tapas and sunglasses, I landed in Barcelona to present at SIGCOMM on what was then a niche topic: network programmability.

I was feeling good about my presentation on Service Creation with the Junos SDK (now known as JET). I’d trained many people on it, so I knew everything from its use cases to APIs like the back of my hand. Gung-ho, as I shook hands with people in the room, I was greeted with enthusiastic faces.

A decade later, I don’t remember the talk. I do remember its Q&A. Raised hands eclipsed my talk with many questions comparing our SDK to a similar demo and presentation from the day before that I had missed. Well I’ll be damned. I had followed—you guessed it—OpenFlow.

For me that was SDN’s big bang moment. The ensuing unbridled enthusiasm for OpenFlow percolated up from this academic setting into a frenzy of new foundations and projects, controllers, APIs, and a smorgasbord of overlay and control protocols. As the networking industry was soon flush with SDN startups, many established players SDN-washed anything that resembled software, and soon that spread to a software-defined movement that impacted all things infrastructure.

Back then, “en primeur” SDN looked like a dicey proposition, but its preteen years have seen it mature in the data center and WAN. Today, software-defined is almost a norm, expected to be poured into all places in network, to keep up with the operational reliability and speed demanded of these times of cloud-charged disruption.

As multicloud infrastructure, the new IT platform, permits and needs greater automation, digital operations teams are impelled to a new status quo: software-defined and AI-driven. While we still hear of SDDC, SD-WAN and SD-Branch, expectations today are that all places in network are ruled with software. And soon, if not already, that software must be AI-fueled for smarter AIOps and reliability engineered like DevOps.

Putting aside expectations and buzzwords, let’s objectively look at how SDN is moving along. 2019’s litmus test is a far cry from the old standby of control and data plane separation.

How is SDN Evolving in the Open?

It’s impossible to write an account of SDN without giving meaningful consideration to openness.

In some enterprise engineering teams, closed and proprietary solutions are contraband, but when it comes to SDN, there few open-source projects simple enough to operationalize right off the shelf of GitHub or DockerHub. Commercially available products still carry the day, but a predilection for open source hangs heavy in the air as it brings the benefits of a community for discussion and sharing of automated workflow tools, frameworks, tests and playbooks.

While open source can introduce economies of multi-vendor engineering, testing and co-creation with code-savvy customers, it’s not replacing open standards. If such a replacement is looming, it’s narrowly in the cloud-native space of securely connecting Kubernetes application clusters and service meshes. But where SDN meets hardware top to bottom, and where multiple SDNs meet to federate and span domain boundaries end to end, interoperability isn’t yet forged in open source. Multi-vendor interoperability and multi-SDN system federation hinges on open standards-based protocols, especially those truly proven and widespread standards.

Interoperability prevents technical debt and increases customer freedom. However, customer freedom is an unlikely strategy for the largest vendor incumbents, which is why we still see some rival SDNs bound to vendor hardware and vendor workload orchestration systems.

Interoperability should also not be confused with overlays which are merely agnostic to what IP network on which they ride. Overlays provide separation of concerns, and like interoperability, they afford a way to insert into brownfield environments. But without interoperable network boundaries, overlay SDN solutions are islands eventually destined as technical debt, the ball and chain to IT progress.

Security is Shifting Left

SDN openness may be in various shades of optional, but everyone understands security is a must because from the network engineer to the board of directors, people today are acutely aware of the specter of security breaches and attacks.

Whether it is multi-tenancy, micro-segmentation, mutual identity authentication, or secure SD-WAN based on next-gen firewalls, in software development and network development security has shifted left, moving it earlier on the timeline of project considerations. Infosec used to be a rubber stamp at the end of projects, now security is foundational and a first priority.

With the progression of SDN, instead of only traditional secure perimeters, network security is now getting measured out in course- and fine-grained means for multi-factor defense in depth. And in the race against advanced and automated threats, software-defined systems have also made it simpler to manage security policies and automatically enforce them by applying protection within the network faster than ever before.

How is SDN Evolving Automation?

In the aftermath of the early SDN hype, the industry experienced progression on the front of orchestrating operations and regression from the focus on automating them.

The evolution of controllers made automation turnkey, focusing on what instead of how. In data centers, the dynamic machinery inside of the network was automated in step with workflow orchestrators like vSphere, OpenStack, and today, more so Kubernetes and OpenShift. In the case of SD-WAN, orchestration meant dealing in zero-touch branch onboarding and choosing WAN policy levels for application traffic. Automation to regulate traffic steering across the hybrid WAN uplinks was baked in.

These are oversimplifications of SDN applications, but across SDN domains, the common thread is that 1. control is centralized and abstracted above the device level to span a distributed fleet of infrastructure; and 2. controllers automate workflows, orchestrating, reducing and simplifying steps. The industry attention spent adapting to software-defined has, until recently, been mostly about these benefits of moving from commanding device CLIs to orchestrating controller GUIs. But orchestration implies automated networks, and changes nothing about automated network-ing. Automated network operations (NetOps) is the next frontier.

While workflows are leveled up and centralized with SDN, so are workflow APIs, another key aspect of SDN systems. With this, there is also the possibility to automate NetOps: evolving to automated testing, troubleshooting, change controls, service-level indicators, and service-level reliability. This is an area of network automation focused less on the automation technology inside software-defined networks, and more on the processes and people that will software-define operations.

In this transformational movement that network engineers have dubbed network reliability engineering (NRE), a subject we at Juniper have promoted a good deal, technology also plays an important role. SDN systems are important so that engineers are building on SDN abstractions and workflows and doing so with centralized APIs instead of down at each device.

Analytics and AI Are Driving the Future of Automation

Big data analytics, machine learning and AI are certainly sensationalized, but that’s because they show promise. If they can be applied to customer trends, voter preferences, and many other fields with complex data points, they can also be applied to IT. A prerequisite to this is already found in SDN systems: a central point of management. And networks are teeming with data to be centrally collected and processed.

Many SDN systems already incorporate telemetry, analytics and some of them leverage AI. If they don’t, you can bet it’s on the roadmap. And analytics and AI are important ingredients in automation, both within SDN systems and for improving the operations-contextual automated NetOps on top of SDN.

Since Juniper just acquired Mist, a timely example is the AI-driven Mist Cloud. Combining data points from Wi-Fi and BLE radio, wired, and wide-area networks, the Mist Cloud software uses AI to crunch location-addled degradations and dynamics. It’s able to stabilize network reliability and raise user experience problems, even outside of the WLAN. On the NetOps side of things, Mist’s AI-powered assistant, Marvis, smartens up the human touch and mitigates mistakes.

Another example is found in network reliability engineering. In the data-driven SRE and NRE culture, you can’t manage what you can’t measure. Many SDN systems have higher-order reporting, metrics and centralized telemetry APIs, aiding in the creation of service-level indicators (SLIs). SLIs are important because they map to objectives about the stakeholder-oriented service levels instead of SNMP or metrics that are meaningless to those outside of networking.

NREs also manage errors and outages by automating around them for continuous improvement. As SDN systems begin to incorporate AI-powered predictive analytics, NREs can use those signals to better uphold service levels with automated proactive steps to complement reactive remediation.

Clouds are in the Forecast

The cloud is replete with as-a-service IT offerings, but many SDN systems are still “shrink wrapped software” as downloads to run on premises or in private clouds.

Many networking teams would like to use or at least evaluate SDN, but jumping headlong into unboxed software like SDN is not accessible to smaller networking teams when they have the added work of learning to run and maintain SDN systems. For this reason, cloud-delivered SDN offerings are kindling more SDN adopters. This week, Juniper announced that it is taking its SD-WAN solution and launching the option for Contrail SD-WAN as a service.

Cloud-based SDN for all domains of networking seems interesting but won’t be right for all customers nor all use cases. Cloud-based data collection comes with mixed benefits and drawbacks.

For some, sending telemetry data to the cloud may be prohibited or prohibitively inefficient, especially in cases where certain telemetry requires a quick reaction. On the other hand, to the cloud’s benefit, if many networking teams don’t have the means to operate SDN, then they surely don’t have the means to run sophisticated analytics or AI systems on premises. A cloud service solves for simplicity. It also solves for quicker vendor innovation cycles and smarter processing because the economies of scale for data storage, analytics and now hardware-assisted AI processing based in the cloud are enormous and incomparable.

In summary, the first software-defined decade has been spectacular, and seasoned solutions are seeing a lot of production. Now, with so much contributing to SDN’s innovation and reinvention, I look forward to the next odyssey and seeing what will define software-defined. Drop a comment below to share your own anticipations.

Podcast of this blog on YouTube.

image credit moritz320/pixabay

The Tale of the Network Automator: From Knight Errant to Engineer by James Kelly


Podcast on YouTube

Perchance the temptation of technology and for certain the pangs of toil have persuaded some networkers to stray from the path of the CLI and *IE certification race.

Swinging APIs, they’ve heard, is the promised land and wielding shiny tools like Ansible can, not only confirm their chivalry, but also chase away any lament of rote configuration changes and misfortunes of fat fingers.

“Tonight, Sancho, we drink from the cup of automation. And we will be safely protected from any insurgent application issues rising to the heights of our virtuous network.”

Chapter 1: Initiation

And so once upon a time, after the ecstasy of donning the automation helmet and downloading a few horses, networkers looking for a rite of passage into the world of automation looked bravely in the mirror and said to themselves, “what is there to do that’s dangerous around here?”

The way many networkers made haste to an automation quest was ostensibly with all the strategy of the ad hoc adventures of a knight errant, looking to prove their worth.

These trailblazers were defined by the natural signposts in their workflows, by vendor toolkits, and by tools imported from the far away land of sysadmin. And so, often under the tutelage of nothing but their own trial and error, the networker knight errants learned to conquer a couple of the workflows that they were presented with, slaying manual steps one by one.

Chapter 2: Calamity

Shortly thereafter, while riding proudly toward a well-deserved weekend, the knights were so enchanted with their new weaponry and so pleased with themselves—their heads, swollen with victories, may have tightened helmets to their heads—that their attention and vision was impaired. And it was while fantasizing of their next rout that the slings and arrows started flying.

The actual assault is not important: a sudden unreachability, an impetuous line-of-business change request, a cloud VPC meltdown. It was as unexpected as it was inevitable. For some of the knights the affliction would uncover a weakness in armor. For others, it may have been the sting of the automation sword that was erratically swung and cut the wrong way, this time automating and thus swiftly amplifying an accident.

Chapter 3: Dedication

For some beginners, the setback of an automation blunder was so vexing that they turned back to peasant life. For many however, after finding their feet again and after some commiseration with knight companions, they found the strength to press on, carrying the lesson of their lapse forward to not repeat that unforgettable venomous mistake.

In knighthood, the networkers aspired to be so valiant and so adroit that they got the notion that if they’re doing it right, they won’t have difficulties. But difficulties, it turns out, are precisely the best circumstances for learning.

Ergo, the knight automators that persisted and persevered learned to be magnanimous with themselves first and foremost. To uphold future reliability, they resolved to automate around the woes that wounded them. In this way, an indiscretion, once defeated with automation, would never weasel its way back to a second sour encounter.

Chapter 4: Trials

Again and again in the hardship of automating their accomplishments, the networker knights became more astute and attuned to their environment. They began forging their own armaments and virtual squires to do their bidding. They also became less foolish and blunt on their escapades, taking what was useful from other toolsmiths and intrepid automators with software engineering skills.

In their training, the knights’ tenacity grew used to absorbing the lessons of failure, but they grew tired of the disrepute and thorns of trial and error.

As fortune would have it, one day they encountered software sages that spoke of staged replica battles. “But what is the use of repeating the past if we have integrated its teachings?” the knights inquired. “Not the past;” explained a sage, “we stage in preparation for any future change and crusade with many possibilities accounted for and tested.”  Testing and practice ahead of affronting a matter: it sounded ingenious, and so the knights too started building training grounds to rehearse their affairs.

Through testing, stressing and staging, and then automating and strengthening that preparation work too, the knights became known for their fastidious preparation and resulting dependability to conquer new projects and change conquests in production, and now with less havoc than sorely familiar from the past. Soon they became so mighty that their automation could take on more incessant action and circumstances of increasing variety.

Chapter 5: Consummation

Becoming even more zealous about automating trials not to error, the knights decided to consummate their erudition and experience by taking new identities as knight engineers.

See, each knight’s struggles and the many stories shared at the inns gradually made them less wandering and more disciplined and deliberate about their path. Their practice became more ritual and rigorous, and their automating more rewarding. Through their evolving wisdom, they had transformed from errant to engineer, worthy of an order or iron ring.

In due course, through tournaments and their own track record, the knight engineers decided they could not be passionate, proud engineers unless they could also measure and show their success. After all, they vowed reliability to others. And so, they decided to build public displays for every pledge of reliability that would maintain accurate assessments of their results. Constructed with even more automation, these displays objectively told of the current conduct and past performance of the systems in their lands.

Ultimately, so well-known these reliability signs of service became, that the ardent automation engineers are today called network reliability engineers.

Coda: The Canons of NRE

There are two discernable standards to which the knights held.

First, reliability was their utmost value. They determined that at the base of the hierarchy of needs for their service, if the network was not measurably reliable in the ways they promised, that nothing else mattered. Instead of trading off reliability with other values like agility, efficiency and so on, these boons were incidental because they first solved for reliability with automation.

Second, engineering is the best road to take as an automator. The processes and skills of software engineering guide far better than getting consumed with happened-upon technology, tools and APIs. Networking workflows are the battleground in which to practice automating and a brilliant place to start ad hoc, but ultimately the rigors of software engineering will provide strategy and structure for the tactics and tools.

For more on the culture, skills, processes, behaviors and common technologies of network reliability engineer (NRE) roles and teams, look to the famous older cousins: site reliability engineers (SRE). Check out the free online SRE book and my past blogs.


Thanks to the NREs, SREs and network automator forebears for your inspiration, enlightenment and advice in putting together the 5-step journey to automated NetOps. With that guide, may the path of forthcoming automators be smoother and more straightforward than your own story.

Cliff Notes Translation

For those unfamiliar with the famous humor and satire of the book Don Quixote, the above story and style may require some explanation. However, I suspect that those, even the slightest bit attentive to the network automation space, can see the parallel between this metaphor and real life.

Here are the Cliff Notes of the stereotypical story:

Chapter 1: Not looking before they leap into automation, many people progress in a rather ad hoc manner, tuning tribal knowledge and apparent workflows into an opportunity to learn while aggregating manual tasks. This is mostly governed by the tools they know or those put in front of them—which may or may not be the best tool for the jobs at hand.

Chapter 2: Many people get stung by automation in one way or another. Especially with config management automation, one small issue could easily get propagated to a massive blast radius. Change workflows aren’t the safest place to learn, but they are the most infamous for some reason. Automating directly in production when learning and without testing is also a catastrophe waiting to happen.

Chapter 3: Continuous improvement and learning is always the foundation of good automation. People learn that small changes and sometimes small mistakes are where the lessons lay to help discover what to automate next. This can be taken all the way to NRE concepts like chaos engineering, where failure is induced on purpose.

Chapter 4: The rigors of software engineering like test-driven engineering, continuous integration and delivery (CICD) and automated deployment are critical to success and safely moving quickly, instead of trading off reliability for velocity. Automation is learned in pre-production and environments like labs or virtual labs. Over time, engineers also create well-tested in-production continuous response.

Chapter 5: Instead of relying on only general monitoring tools, NREs, like SREs, engineer what matters most: service-level indicators to objectively measure success of higher-order service goals and promises. With the help of data, they manage their way to truly measured continuous improvement.

Podcast on YouTube

image credit blitzmaerker/pixabay

Seven Deadly Deceptions of Network Automation by James Kelly


Podcast on YouTube

The greatest deception men suffer is from their own opinions. - Leonardo da Vinci

“Network automation does not an automated network make.” Those same words started my formative piece on DevNetOps. It reasoned that we must elevate DevOps culture, processes and principles above technology, end random acts of network automation, and instead pursue holistically automated network engineering and operations. The professional that implements this—from code to production—is the network reliability engineer (NRE). The NRE implements DevNetOps for network infrastructure just as the SRE implements DevOps for apps and platforms.

It’s been a journey discovering DevNetOps and network reliability engineering. With help from peers and NRE friends, I’ve faced debate and dogma forged in the fiery cynicism of the networking I&O silo. To share these lessons, let’s overturn some anti-patterns and deceptions, starting with the opposite of the NRE: those who say automation is “not for me.”

1. It’s not for me

You used to hear people say, “We’re not Google. We don’t have those problems or need those solutions.” Today everyone is mad for #GIFEE and racing for the same outcomes as the unicorns. If you think you’re a thoroughbred horse in a different race, you’re utterly deceiving yourself, and your business is heading to the glue factory.

Before we can change our minds, we must open our minds. Life is an inside job.

If you're a network admin, the rationale to retitle yourself as an NRE is right in front of you. Look forward. You’ll see a future less doldrum, more creative, and one where you have more control over your own destiny and that of your organization. And more pay and job opportunities too. Yes, NRE is already an actual job title.

With retitling comes reform. You used to rely on vendors for all network engineering, but this relegated operations people to technicians instead of technologists. As an NRE, you don’t need to hop over the proverbial dev-ops wall, to engineer boxes and SDN systems. You just need to lower the wall and pick up where vendors leave off. Their day of product delivery to market and to you, is your day zero where you automate, not only integration workflows, but outcomes like accuracy, reliability, scale, efficiency and ops speed.

2. It’s all about automation and technology

Rod Michael said, “If you automate a mess, you get an automated mess.” Automation must follow architecture and accuracy.

It’s common for builders to want to build, but you cannot be so swift as to forget the blueprints. There’s a balance to strike between build and design. To be sure, a DevOps mindset promotes build iteration, as did Mark Twain when he said, “Progressive improvement beats delayed perfection.”

Of course it’s not all planning processes or forging culture, but technocrats tend to obsess over technology too much. Network reliability engineering and DevNetOps is not only about technology, just as racing is not only about cars.

3. It’s only about open source

I’m a proponent of open source and believe it aligns with human nature. From GitHub to the growth of the CNCF and so many other projects, the open-source watershed has hit.

However, especially in networking, there is plenty of closed source. Fortunately, open APIs make integration and automation possible even for closed systems. Today, open APIs are increasingly commonplace because they’re not a nicety—they’re a necessity.

Moreover, the as-code, gitOps, and CI/CD movements shine the automation spotlight onto pre-production pipelines and processes. These trends are supported by and apply equally to open- and closed-source software, so don’t let closed source deter your DevNetOps desires.

4. It’s incompatible with ITIL, InfoSec, I&O or hardware

You might believe there’s no need for infrastructure rapid iteration, agility and experimentation. But just because you don’t need all the benefits of a DevOps culture, doesn’t mean you don’t need any.

You may also deceive yourself, thinking networking is different. But just because network hardware is more foundational than application software and less flexible today, doesn’t make DevNetOps ideas impossible. It’s precisely because networking is foundational that having it automated is crucial and will add simplicity and flexibility.

First of all, there is a large software side to networking—SDN, NFV and network management—where we can more easily apply DevNetOps behaviors. Translating some behaviors like CI/CD and chaos engineering to network devices, however, isn't straightforward. In a past article on TheNewStack, I examined the difficulties aligning Agile to today’s architectures in network operating systems, boxes and topologies. In re-architecting networking for DevNetOps, we ought to draw inspiration from microservices—a catalyst for the traditional DevOps transformation—because smaller architectural units allow for smaller, safer, and speedier steps of change.

Finally, many DevOps practitioners have overcome organizational policy “barriers” like ITIL and InfoSec. As well established in the DevOps handbook, success lies not in rallying anarchy; rather the DevOps principles automate in security, compliance and consistency.

5. It’s not obvious where to start

The territory is now increasingly marked with maps: training and didactic case studies. But don't mistake studying for starting. Complement your wonder with some wander. Try playing with git. Sharpen up your programming fingers. Give that tool a whirl.

There are many paths to success. Even if your journey is serpentine, even if you lose true north, you may pick up useful tools and lessons in unexpected places.

Like building any new habit, it’s useful to have a buddy, or better yet a two-pizza team. You'll progress quickest in green fields. Choose a team project with no technical debt when you’re just starting out and take small wins and small risks. When you allow for failure and iteration, you record lessons into processes and automated systems, and you grow people.

The easiest place to start is at the beginning of the project stream, where it is small, not down in the ocean. Start at day-0 and flow from pre-production to production. Build as simple an automated pipeline as possible to integrate artifacts, secrets and configuration as code. As you mature, expand the middle: pipeline orchestration, building, testing, integration, more testing, immutable deliveries, and finally orchestrated deployments into staging and then production. Eventually beyond network and SDN automated deployments, you will have other in-production automation extensions for systems integration and event handling that can follow the same pipeline.

6. It’s all about speed

I wanna go fast! - Ricky Bobby

Keeping up with the pace of technology is ever harder. And so goes the saying, “the future belongs to the fast.” But when it comes to automation, the NRE title tells us something very important: we must focus on reliability.

Speed alone will never win a race, and speed without reliability is a glorious way to crash and burn, just ask rocket scientists. If you were a racecar driver—one smarter than Ricky Bobby—you would say that to finish first, you must first finish.

A twin burden today, equally as confronting as the need for speed, is complexity. You know if Dijkstra, a networking hero for his SPF algorithm, were alive today, he would be a champion of network reliability engineering simplicity (a coincidental portmanteau-ing of NRE and the Juniper anthem) because of his famous quote, “Simplicity is prerequisite to reliability.”

In summary, we need speed, and we need smart. We must be consistent with simplicity, effectiveness, efficiency and reliability ( while employing the economies of velocity, agility, scale and reach (...speed). We all love going fast, but it's not how fast you drive, it's how you drive fast.

7. It’s all about DevNetOps & NRE

The hype of DevOps and SRE is probably warranted if you seriously put it to work. I believe the same is true for DevNetOps and NRE.

However, these are just signposts. Like the Buddhist lesson that the finger pointing to the moon is not the moon itself. If you miss the moon for the finger, you’ve missed the glory.

The real truth in technology is that transformation is the only timeless topic. Digital transformation has been around for three decades, and the digital intelligence transformation is on its way next.

To manage transformation: in technology, equip for an evolvable architecture; in process, incorporate continuous improvement; and in people, embrace continuous learning.

I’ll leave you with a final quote, I often use when speaking on these topics.

It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change. - Charles Darwin

Waving the NRE Flag, Live at Open Networking Summit


Next Wednesday at 11:15, Matt Oswalt and I will be speaking on NRE and DevNetOps at the Open Networking Summit. We’ll be joined by Doug Lardo from Riot Games who will share lessons from the front lines. If you happen to be there, please join us. If not, take to Twitter (jk, mo) or comment below on how you see the evolution of the #NRE.

Podcast on Soundcloud
Podcast on YouTube

image credit steampunk/pixabay