Software-Defined, a Decade Later by James Kelly


The first African-American president had been sworn in, the king of pop had just passed, @realDonaldTrump had taken to tweeting, and Avatar was about to smash box-office records, but what I remember about the summer of 2009 is the dawn of software-defined.

Looking forward to tapas and sunglasses, I landed in Barcelona to present at SIGCOMM on what was then a niche topic: network programmability.

I was feeling good about my presentation on Service Creation with the Junos SDK (now known as JET). I’d trained many people on it, so I knew everything from its use cases to APIs like the back of my hand. Gung-ho, as I shook hands with people in the room, I was greeted with enthusiastic faces.

A decade later, I don’t remember the talk. I do remember its Q&A. Raised hands eclipsed my talk with many questions comparing our SDK to a similar demo and presentation from the day before that I had missed. Well I’ll be damned. I had followed—you guessed it—OpenFlow.

For me that was SDN’s big bang moment. The ensuing unbridled enthusiasm for OpenFlow percolated up from this academic setting into a frenzy of new foundations and projects, controllers, APIs, and a smorgasbord of overlay and control protocols. As the networking industry was soon flush with SDN startups, many established players SDN-washed anything that resembled software, and soon that spread to a software-defined movement that impacted all things infrastructure.

Back then, “en primeur” SDN looked like a dicey proposition, but its preteen years have seen it mature in the data center and WAN. Today, software-defined is almost a norm, expected to be poured into all places in network, to keep up with the operational reliability and speed demanded of these times of cloud-charged disruption.

As multicloud infrastructure, the new IT platform, permits and needs greater automation, digital operations teams are impelled to a new status quo: software-defined and AI-driven. While we still hear of SDDC, SD-WAN and SD-Branch, expectations today are that all places in network are ruled with software. And soon, if not already, that software must be AI-fueled for smarter AIOps and reliability engineered like DevOps.

Putting aside expectations and buzzwords, let’s objectively look at how SDN is moving along. 2019’s litmus test is a far cry from the old standby of control and data plane separation.

How is SDN Evolving in the Open?

It’s impossible to write an account of SDN without giving meaningful consideration to openness.

In some enterprise engineering teams, closed and proprietary solutions are contraband, but when it comes to SDN, there few open-source projects simple enough to operationalize right off the shelf of GitHub or DockerHub. Commercially available products still carry the day, but a predilection for open source hangs heavy in the air as it brings the benefits of a community for discussion and sharing of automated workflow tools, frameworks, tests and playbooks.

While open source can introduce economies of multi-vendor engineering, testing and co-creation with code-savvy customers, it’s not replacing open standards. If such a replacement is looming, it’s narrowly in the cloud-native space of securely connecting Kubernetes application clusters and service meshes. But where SDN meets hardware top to bottom, and where multiple SDNs meet to federate and span domain boundaries end to end, interoperability isn’t yet forged in open source. Multi-vendor interoperability and multi-SDN system federation hinges on open standards-based protocols, especially those truly proven and widespread standards.

Interoperability prevents technical debt and increases customer freedom. However, customer freedom is an unlikely strategy for the largest vendor incumbents, which is why we still see some rival SDNs bound to vendor hardware and vendor workload orchestration systems.

Interoperability should also not be confused with overlays which are merely agnostic to what IP network on which they ride. Overlays provide separation of concerns, and like interoperability, they afford a way to insert into brownfield environments. But without interoperable network boundaries, overlay SDN solutions are islands eventually destined as technical debt, the ball and chain to IT progress.

Security is Shifting Left

SDN openness may be in various shades of optional, but everyone understands security is a must because from the network engineer to the board of directors, people today are acutely aware of the specter of security breaches and attacks.

Whether it is multi-tenancy, micro-segmentation, mutual identity authentication, or secure SD-WAN based on next-gen firewalls, in software development and network development security has shifted left, moving it earlier on the timeline of project considerations. Infosec used to be a rubber stamp at the end of projects, now security is foundational and a first priority.

With the progression of SDN, instead of only traditional secure perimeters, network security is now getting measured out in course- and fine-grained means for multi-factor defense in depth. And in the race against advanced and automated threats, software-defined systems have also made it simpler to manage security policies and automatically enforce them by applying protection within the network faster than ever before.

How is SDN Evolving Automation?

In the aftermath of the early SDN hype, the industry experienced progression on the front of orchestrating operations and regression from the focus on automating them.

The evolution of controllers made automation turnkey, focusing on what instead of how. In data centers, the dynamic machinery inside of the network was automated in step with workflow orchestrators like vSphere, OpenStack, and today, more so Kubernetes and OpenShift. In the case of SD-WAN, orchestration meant dealing in zero-touch branch onboarding and choosing WAN policy levels for application traffic. Automation to regulate traffic steering across the hybrid WAN uplinks was baked in.

These are oversimplifications of SDN applications, but across SDN domains, the common thread is that 1. control is centralized and abstracted above the device level to span a distributed fleet of infrastructure; and 2. controllers automate workflows, orchestrating, reducing and simplifying steps. The industry attention spent adapting to software-defined has, until recently, been mostly about these benefits of moving from commanding device CLIs to orchestrating controller GUIs. But orchestration implies automated networks, and changes nothing about automated network-ing. Automated network operations (NetOps) is the next frontier.

While workflows are leveled up and centralized with SDN, so are workflow APIs, another key aspect of SDN systems. With this, there is also the possibility to automate NetOps: evolving to automated testing, troubleshooting, change controls, service-level indicators, and service-level reliability. This is an area of network automation focused less on the automation technology inside software-defined networks, and more on the processes and people that will software-define operations.

In this transformational movement that network engineers have dubbed network reliability engineering (NRE), a subject we at Juniper have promoted a good deal, technology also plays an important role. SDN systems are important so that engineers are building on SDN abstractions and workflows and doing so with centralized APIs instead of down at each device.

Analytics and AI Are Driving the Future of Automation

Big data analytics, machine learning and AI are certainly sensationalized, but that’s because they show promise. If they can be applied to customer trends, voter preferences, and many other fields with complex data points, they can also be applied to IT. A prerequisite to this is already found in SDN systems: a central point of management. And networks are teeming with data to be centrally collected and processed.

Many SDN systems already incorporate telemetry, analytics and some of them leverage AI. If they don’t, you can bet it’s on the roadmap. And analytics and AI are important ingredients in automation, both within SDN systems and for improving the operations-contextual automated NetOps on top of SDN.

Since Juniper just acquired Mist, a timely example is the AI-driven Mist Cloud. Combining data points from Wi-Fi and BLE radio, wired, and wide-area networks, the Mist Cloud software uses AI to crunch location-addled degradations and dynamics. It’s able to stabilize network reliability and raise user experience problems, even outside of the WLAN. On the NetOps side of things, Mist’s AI-powered assistant, Marvis, smartens up the human touch and mitigates mistakes.

Another example is found in network reliability engineering. In the data-driven SRE and NRE culture, you can’t manage what you can’t measure. Many SDN systems have higher-order reporting, metrics and centralized telemetry APIs, aiding in the creation of service-level indicators (SLIs). SLIs are important because they map to objectives about the stakeholder-oriented service levels instead of SNMP or metrics that are meaningless to those outside of networking.

NREs also manage errors and outages by automating around them for continuous improvement. As SDN systems begin to incorporate AI-powered predictive analytics, NREs can use those signals to better uphold service levels with automated proactive steps to complement reactive remediation.

Clouds are in the Forecast

The cloud is replete with as-a-service IT offerings, but many SDN systems are still “shrink wrapped software” as downloads to run on premises or in private clouds.

Many networking teams would like to use or at least evaluate SDN, but jumping headlong into unboxed software like SDN is not accessible to smaller networking teams when they have the added work of learning to run and maintain SDN systems. For this reason, cloud-delivered SDN offerings are kindling more SDN adopters. This week, Juniper announced that it is taking its SD-WAN solution and launching the option for Contrail SD-WAN as a service.

Cloud-based SDN for all domains of networking seems interesting but won’t be right for all customers nor all use cases. Cloud-based data collection comes with mixed benefits and drawbacks.

For some, sending telemetry data to the cloud may be prohibited or prohibitively inefficient, especially in cases where certain telemetry requires a quick reaction. On the other hand, to the cloud’s benefit, if many networking teams don’t have the means to operate SDN, then they surely don’t have the means to run sophisticated analytics or AI systems on premises. A cloud service solves for simplicity. It also solves for quicker vendor innovation cycles and smarter processing because the economies of scale for data storage, analytics and now hardware-assisted AI processing based in the cloud are enormous and incomparable.

In summary, the first software-defined decade has been spectacular, and seasoned solutions are seeing a lot of production. Now, with so much contributing to SDN’s innovation and reinvention, I look forward to the next odyssey and seeing what will define software-defined. Drop a comment below to share your own anticipations.

Podcast of this blog on YouTube.

image credit moritz320/pixabay

The Tale of the Network Automator: From Knight Errant to Engineer by James Kelly


Podcast on YouTube

Perchance the temptation of technology and for certain the pangs of toil have persuaded some networkers to stray from the path of the CLI and *IE certification race.

Swinging APIs, they’ve heard, is the promised land and wielding shiny tools like Ansible can, not only confirm their chivalry, but also chase away any lament of rote configuration changes and misfortunes of fat fingers.

“Tonight, Sancho, we drink from the cup of automation. And we will be safely protected from any insurgent application issues rising to the heights of our virtuous network.”

Chapter 1: Initiation

And so once upon a time, after the ecstasy of donning the automation helmet and downloading a few horses, networkers looking for a rite of passage into the world of automation looked bravely in the mirror and said to themselves, “what is there to do that’s dangerous around here?”

The way many networkers made haste to an automation quest was ostensibly with all the strategy of the ad hoc adventures of a knight errant, looking to prove their worth.

These trailblazers were defined by the natural signposts in their workflows, by vendor toolkits, and by tools imported from the far away land of sysadmin. And so, often under the tutelage of nothing but their own trial and error, the networker knight errants learned to conquer a couple of the workflows that they were presented with, slaying manual steps one by one.

Chapter 2: Calamity

Shortly thereafter, while riding proudly toward a well-deserved weekend, the knights were so enchanted with their new weaponry and so pleased with themselves—their heads, swollen with victories, may have tightened helmets to their heads—that their attention and vision was impaired. And it was while fantasizing of their next rout that the slings and arrows started flying.

The actual assault is not important: a sudden unreachability, an impetuous line-of-business change request, a cloud VPC meltdown. It was as unexpected as it was inevitable. For some of the knights the affliction would uncover a weakness in armor. For others, it may have been the sting of the automation sword that was erratically swung and cut the wrong way, this time automating and thus swiftly amplifying an accident.

Chapter 3: Dedication

For some beginners, the setback of an automation blunder was so vexing that they turned back to peasant life. For many however, after finding their feet again and after some commiseration with knight companions, they found the strength to press on, carrying the lesson of their lapse forward to not repeat that unforgettable venomous mistake.

In knighthood, the networkers aspired to be so valiant and so adroit that they got the notion that if they’re doing it right, they won’t have difficulties. But difficulties, it turns out, are precisely the best circumstances for learning.

Ergo, the knight automators that persisted and persevered learned to be magnanimous with themselves first and foremost. To uphold future reliability, they resolved to automate around the woes that wounded them. In this way, an indiscretion, once defeated with automation, would never weasel its way back to a second sour encounter.

Chapter 4: Trials

Again and again in the hardship of automating their accomplishments, the networker knights became more astute and attuned to their environment. They began forging their own armaments and virtual squires to do their bidding. They also became less foolish and blunt on their escapades, taking what was useful from other toolsmiths and intrepid automators with software engineering skills.

In their training, the knights’ tenacity grew used to absorbing the lessons of failure, but they grew tired of the disrepute and thorns of trial and error.

As fortune would have it, one day they encountered software sages that spoke of staged replica battles. “But what is the use of repeating the past if we have integrated its teachings?” the knights inquired. “Not the past;” explained a sage, “we stage in preparation for any future change and crusade with many possibilities accounted for and tested.”  Testing and practice ahead of affronting a matter: it sounded ingenious, and so the knights too started building training grounds to rehearse their affairs.

Through testing, stressing and staging, and then automating and strengthening that preparation work too, the knights became known for their fastidious preparation and resulting dependability to conquer new projects and change conquests in production, and now with less havoc than sorely familiar from the past. Soon they became so mighty that their automation could take on more incessant action and circumstances of increasing variety.

Chapter 5: Consummation

Becoming even more zealous about automating trials not to error, the knights decided to consummate their erudition and experience by taking new identities as knight engineers.

See, each knight’s struggles and the many stories shared at the inns gradually made them less wandering and more disciplined and deliberate about their path. Their practice became more ritual and rigorous, and their automating more rewarding. Through their evolving wisdom, they had transformed from errant to engineer, worthy of an order or iron ring.

In due course, through tournaments and their own track record, the knight engineers decided they could not be passionate, proud engineers unless they could also measure and show their success. After all, they vowed reliability to others. And so, they decided to build public displays for every pledge of reliability that would maintain accurate assessments of their results. Constructed with even more automation, these displays objectively told of the current conduct and past performance of the systems in their lands.

Ultimately, so well-known these reliability signs of service became, that the ardent automation engineers are today called network reliability engineers.

Coda: The Canons of NRE

There are two discernable standards to which the knights held.

First, reliability was their utmost value. They determined that at the base of the hierarchy of needs for their service, if the network was not measurably reliable in the ways they promised, that nothing else mattered. Instead of trading off reliability with other values like agility, efficiency and so on, these boons were incidental because they first solved for reliability with automation.

Second, engineering is the best road to take as an automator. The processes and skills of software engineering guide far better than getting consumed with happened-upon technology, tools and APIs. Networking workflows are the battleground in which to practice automating and a brilliant place to start ad hoc, but ultimately the rigors of software engineering will provide strategy and structure for the tactics and tools.

For more on the culture, skills, processes, behaviors and common technologies of network reliability engineer (NRE) roles and teams, look to the famous older cousins: site reliability engineers (SRE). Check out the free online SRE book and my past blogs.


Thanks to the NREs, SREs and network automator forebears for your inspiration, enlightenment and advice in putting together the 5-step journey to automated NetOps. With that guide, may the path of forthcoming automators be smoother and more straightforward than your own story.

Cliff Notes Translation

For those unfamiliar with the famous humor and satire of the book Don Quixote, the above story and style may require some explanation. However, I suspect that those, even the slightest bit attentive to the network automation space, can see the parallel between this metaphor and real life.

Here are the Cliff Notes of the stereotypical story:

Chapter 1: Not looking before they leap into automation, many people progress in a rather ad hoc manner, tuning tribal knowledge and apparent workflows into an opportunity to learn while aggregating manual tasks. This is mostly governed by the tools they know or those put in front of them—which may or may not be the best tool for the jobs at hand.

Chapter 2: Many people get stung by automation in one way or another. Especially with config management automation, one small issue could easily get propagated to a massive blast radius. Change workflows aren’t the safest place to learn, but they are the most infamous for some reason. Automating directly in production when learning and without testing is also a catastrophe waiting to happen.

Chapter 3: Continuous improvement and learning is always the foundation of good automation. People learn that small changes and sometimes small mistakes are where the lessons lay to help discover what to automate next. This can be taken all the way to NRE concepts like chaos engineering, where failure is induced on purpose.

Chapter 4: The rigors of software engineering like test-driven engineering, continuous integration and delivery (CICD) and automated deployment are critical to success and safely moving quickly, instead of trading off reliability for velocity. Automation is learned in pre-production and environments like labs or virtual labs. Over time, engineers also create well-tested in-production continuous response.

Chapter 5: Instead of relying on only general monitoring tools, NREs, like SREs, engineer what matters most: service-level indicators to objectively measure success of higher-order service goals and promises. With the help of data, they manage their way to truly measured continuous improvement.

Podcast on YouTube

image credit blitzmaerker/pixabay

Introducing NRE Labs by James Kelly


“The 80s called and they want their CLI back.” Until recently, that was basically the beckoning call of the network automation and programmability plot. Like a broken record, the replayed attention on tools and APIs talked of new NetOps technology, but it didn’t paint a picture of the promised land. It didn’t provide a map and it didn’t prepare anyone professionally.

Today, as Network Reliability Engineering (NREs) roles pave the way from CLIs to SLIs and other higher-order SRE-inspired methods and metrics, the picture of automated NetOps is crystalizing. Juniper Networks has elaborated the 5-step map of the journey. And now, with the launch of NRE Labs, Juniper is introducing the virtual training camp to support the trip to the promised land. It’s open—open source and open for use—and it’s by and for network engineers.

People are the greatest asset or predicament to any transformation like automating NetOps, so this is where the focus was on democratizing NRE learning for all.

An Automation Dojo in the Browser

For most network engineers, automating has either been an ad hoc adventure, or the barrier to entry has stopped engineers from starting. And without the right roots, asking for automation from network engineers is like asking for pears from a pine tree. Hands-on time is needed to form the software engineering skills and experience needed to architect and automate.

Providing a cheap, quick and easy solution, NRE Labs is free and easily accessible. It includes live terminal access to one’s own network devices and linux systems, right in the browser. Each learning lab topic is pieced apart into lessons that are, in turn, comprised of short lab steps that each take only a couple minutes. This is important because it eliminates those overwhelming obstacles to getting started with hands-on learning.

  • It removes the risk of learning by trial and error in production

  • It doesn’t require a sign-up and doesn’t have a classroom or a teacher

  • It doesn't require a long download or physical or virtual lab setup times

  • It doesn’t demand expertise or painful piecing together of all the background infrastructure

  • It demands zero prerequisite knowledge of tools and programming

Without any major impediments, anybody can jump right in and click through a few lessons right in their web browser.

The main NRE Labs runtime sponsored by Juniper Networks is available at:

Learn by Doing

Education that isn’t applied is quickly forgotten. That’s why the topics and lessons in NRE Labs are organized into real-life NetOps contexts and workflows. NRE Labs is also built with useful systems and tools that are widely available and applicable to network engineers today.

NRE Labs was created to be accessible to anyone, so it starts from scratch with a learning topic for basics, but users can move around lessons as they see fit. Other topics include troubleshooting, configuration, testing and verification. In the future, additional lessons will take these topics deeper and additional topics will take learning broader. This includes covering the spectrum of NetOps workflows; structuring automation and infrastructure as code with gitOps rigor; evolving troubleshooting into proactive testing; orchestrating testing, delivery and deployment with a pipeline; and using telemetry and analytics for monitoring and measuring service-level indicators and automating network remediation and regulation.

Open for Contributions

NRE Labs needs to be open to democratize learning across different networking environments, but it’s also open to keep up with the rate of and scale it to all kinds of innovators broadly automating their networks that have something to share.

The NRE Labs back-end infrastructure and front-end lessons are all open source as the NRE Learning Antidote project and code repository on GitHub. The project’s documentation explores some background on the Kubernetes-based Antidote infrastructure and explains how to run a standalone instance of NRE Labs to develop, test and contribute improvements and lessons.

For the express documentation, Juniper has open sourced the NRE Labs project summary as a poster.

While NRE Labs is live today, it’s in tech preview and we’re looking for quick iterative progress over delayed perfection. We welcome contributions! Other than curriculum development, some of the back-end work revolves around hardening and scaling the infrastructure and shortening start times of network nodes. We’d also greatly appreciate front-end work such as mobile optimizations. Suggestions are welcome and are already flowing in. Use the project’s issues tracker on GitHub to find work or share feedback.

To be Continued

Notice that NRE is learning created by and for network engineers and free of external marketing. You can use it anonymously, but we hope you’ll tell us what you think. 

Beyond learning new NRE and automation skills, NRE Labs aims to spark new, open conversations to listen to or chime in. Follow @nrelabs for updates on new content, improvements and more. We also created an #nre_labs channel in the Slack spaces for the Juniper Engineering Network (EngNet) (Join here) and the vendor-neutral space run by Network2Code (Join here).

Please spread this day-one news and follow the blogs as lessons continue to roll out.

See you in the community,

@mierdin@cloudtoad and @jameskellynet from the NRE Learning Team

This blog was originally published at