Software-Defined, a Decade Later by James Kelly


The first African-American president had been sworn in, the king of pop had just passed, @realDonaldTrump had taken to tweeting, and Avatar was about to smash box-office records, but what I remember about the summer of 2009 is the dawn of software-defined.

Looking forward to tapas and sunglasses, I landed in Barcelona to present at SIGCOMM on what was then a niche topic: network programmability.

I was feeling good about my presentation on Service Creation with the Junos SDK (now known as JET). I’d trained many people on it, so I knew everything from its use cases to APIs like the back of my hand. Gung-ho, as I shook hands with people in the room, I was greeted with enthusiastic faces.

A decade later, I don’t remember the talk. I do remember its Q&A. Raised hands eclipsed my talk with many questions comparing our SDK to a similar demo and presentation from the day before that I had missed. Well I’ll be damned. I had followed—you guessed it—OpenFlow.

For me that was SDN’s big bang moment. The ensuing unbridled enthusiasm for OpenFlow percolated up from this academic setting into a frenzy of new foundations and projects, controllers, APIs, and a smorgasbord of overlay and control protocols. As the networking industry was soon flush with SDN startups, many established players SDN-washed anything that resembled software, and soon that spread to a software-defined movement that impacted all things infrastructure.

Back then, “en primeur” SDN looked like a dicey proposition, but its preteen years have seen it mature in the data center and WAN. Today, software-defined is almost a norm, expected to be poured into all places in network, to keep up with the operational reliability and speed demanded of these times of cloud-charged disruption.

As multicloud infrastructure, the new IT platform, permits and needs greater automation, digital operations teams are impelled to a new status quo: software-defined and AI-driven. While we still hear of SDDC, SD-WAN and SD-Branch, expectations today are that all places in network are ruled with software. And soon, if not already, that software must be AI-fueled for smarter AIOps and reliability engineered like DevOps.

Putting aside expectations and buzzwords, let’s objectively look at how SDN is moving along. 2019’s litmus test is a far cry from the old standby of control and data plane separation.

How is SDN Evolving in the Open?

It’s impossible to write an account of SDN without giving meaningful consideration to openness.

In some enterprise engineering teams, closed and proprietary solutions are contraband, but when it comes to SDN, there few open-source projects simple enough to operationalize right off the shelf of GitHub or DockerHub. Commercially available products still carry the day, but a predilection for open source hangs heavy in the air as it brings the benefits of a community for discussion and sharing of automated workflow tools, frameworks, tests and playbooks.

While open source can introduce economies of multi-vendor engineering, testing and co-creation with code-savvy customers, it’s not replacing open standards. If such a replacement is looming, it’s narrowly in the cloud-native space of securely connecting Kubernetes application clusters and service meshes. But where SDN meets hardware top to bottom, and where multiple SDNs meet to federate and span domain boundaries end to end, interoperability isn’t yet forged in open source. Multi-vendor interoperability and multi-SDN system federation hinges on open standards-based protocols, especially those truly proven and widespread standards.

Interoperability prevents technical debt and increases customer freedom. However, customer freedom is an unlikely strategy for the largest vendor incumbents, which is why we still see some rival SDNs bound to vendor hardware and vendor workload orchestration systems.

Interoperability should also not be confused with overlays which are merely agnostic to what IP network on which they ride. Overlays provide separation of concerns, and like interoperability, they afford a way to insert into brownfield environments. But without interoperable network boundaries, overlay SDN solutions are islands eventually destined as technical debt, the ball and chain to IT progress.

Security is Shifting Left

SDN openness may be in various shades of optional, but everyone understands security is a must because from the network engineer to the board of directors, people today are acutely aware of the specter of security breaches and attacks.

Whether it is multi-tenancy, micro-segmentation, mutual identity authentication, or secure SD-WAN based on next-gen firewalls, in software development and network development security has shifted left, moving it earlier on the timeline of project considerations. Infosec used to be a rubber stamp at the end of projects, now security is foundational and a first priority.

With the progression of SDN, instead of only traditional secure perimeters, network security is now getting measured out in course- and fine-grained means for multi-factor defense in depth. And in the race against advanced and automated threats, software-defined systems have also made it simpler to manage security policies and automatically enforce them by applying protection within the network faster than ever before.

How is SDN Evolving Automation?

In the aftermath of the early SDN hype, the industry experienced progression on the front of orchestrating operations and regression from the focus on automating them.

The evolution of controllers made automation turnkey, focusing on what instead of how. In data centers, the dynamic machinery inside of the network was automated in step with workflow orchestrators like vSphere, OpenStack, and today, more so Kubernetes and OpenShift. In the case of SD-WAN, orchestration meant dealing in zero-touch branch onboarding and choosing WAN policy levels for application traffic. Automation to regulate traffic steering across the hybrid WAN uplinks was baked in.

These are oversimplifications of SDN applications, but across SDN domains, the common thread is that 1. control is centralized and abstracted above the device level to span a distributed fleet of infrastructure; and 2. controllers automate workflows, orchestrating, reducing and simplifying steps. The industry attention spent adapting to software-defined has, until recently, been mostly about these benefits of moving from commanding device CLIs to orchestrating controller GUIs. But orchestration implies automated networks, and changes nothing about automated network-ing. Automated network operations (NetOps) is the next frontier.

While workflows are leveled up and centralized with SDN, so are workflow APIs, another key aspect of SDN systems. With this, there is also the possibility to automate NetOps: evolving to automated testing, troubleshooting, change controls, service-level indicators, and service-level reliability. This is an area of network automation focused less on the automation technology inside software-defined networks, and more on the processes and people that will software-define operations.

In this transformational movement that network engineers have dubbed network reliability engineering (NRE), a subject we at Juniper have promoted a good deal, technology also plays an important role. SDN systems are important so that engineers are building on SDN abstractions and workflows and doing so with centralized APIs instead of down at each device.

Analytics and AI Are Driving the Future of Automation

Big data analytics, machine learning and AI are certainly sensationalized, but that’s because they show promise. If they can be applied to customer trends, voter preferences, and many other fields with complex data points, they can also be applied to IT. A prerequisite to this is already found in SDN systems: a central point of management. And networks are teeming with data to be centrally collected and processed.

Many SDN systems already incorporate telemetry, analytics and some of them leverage AI. If they don’t, you can bet it’s on the roadmap. And analytics and AI are important ingredients in automation, both within SDN systems and for improving the operations-contextual automated NetOps on top of SDN.

Since Juniper just acquired Mist, a timely example is the AI-driven Mist Cloud. Combining data points from Wi-Fi and BLE radio, wired, and wide-area networks, the Mist Cloud software uses AI to crunch location-addled degradations and dynamics. It’s able to stabilize network reliability and raise user experience problems, even outside of the WLAN. On the NetOps side of things, Mist’s AI-powered assistant, Marvis, smartens up the human touch and mitigates mistakes.

Another example is found in network reliability engineering. In the data-driven SRE and NRE culture, you can’t manage what you can’t measure. Many SDN systems have higher-order reporting, metrics and centralized telemetry APIs, aiding in the creation of service-level indicators (SLIs). SLIs are important because they map to objectives about the stakeholder-oriented service levels instead of SNMP or metrics that are meaningless to those outside of networking.

NREs also manage errors and outages by automating around them for continuous improvement. As SDN systems begin to incorporate AI-powered predictive analytics, NREs can use those signals to better uphold service levels with automated proactive steps to complement reactive remediation.

Clouds are in the Forecast

The cloud is replete with as-a-service IT offerings, but many SDN systems are still “shrink wrapped software” as downloads to run on premises or in private clouds.

Many networking teams would like to use or at least evaluate SDN, but jumping headlong into unboxed software like SDN is not accessible to smaller networking teams when they have the added work of learning to run and maintain SDN systems. For this reason, cloud-delivered SDN offerings are kindling more SDN adopters. This week, Juniper announced that it is taking its SD-WAN solution and launching the option for Contrail SD-WAN as a service.

Cloud-based SDN for all domains of networking seems interesting but won’t be right for all customers nor all use cases. Cloud-based data collection comes with mixed benefits and drawbacks.

For some, sending telemetry data to the cloud may be prohibited or prohibitively inefficient, especially in cases where certain telemetry requires a quick reaction. On the other hand, to the cloud’s benefit, if many networking teams don’t have the means to operate SDN, then they surely don’t have the means to run sophisticated analytics or AI systems on premises. A cloud service solves for simplicity. It also solves for quicker vendor innovation cycles and smarter processing because the economies of scale for data storage, analytics and now hardware-assisted AI processing based in the cloud are enormous and incomparable.

In summary, the first software-defined decade has been spectacular, and seasoned solutions are seeing a lot of production. Now, with so much contributing to SDN’s innovation and reinvention, I look forward to the next odyssey and seeing what will define software-defined. Drop a comment below to share your own anticipations.

Podcast of this blog on YouTube.

image credit moritz320/pixabay

A Wedding Thank You and Season's Greetings on our 1st Anniversary by James Kelly

Together in Paris, September 2018

Together in Paris, September 2018

One year has flown by. Two bands are now melded onto our ring fingers. Three days of celebration stories have been shared over and over. Seventy-seven guests need to be thanked. Thousands of flowers have colored countless happy memories. And at least a million emojis have been exchanged between Linh and James. All of this since our wedding day, November 19, 2017.

Oh… and a number that most people don’t know: all of them. That’s how many celebrities across the world thanked us for providing them peace for a day because our wedding seemed to pull all the paparazzi. Kidding aside, we’re still so thankful for all the photographers, organizers, musicians and staff, and hope all our friends caught the albums we shared.

On our first anniversary today, Linh and I are back together in Hanoi, and we are reflecting back on our year, especially on our wedding day, one year ago.

Thank you!

Our gratitude is the best place to start, not only because it’s Thanksgiving this week in the States, but because our wedding happiness was truly magnified by those of you that attended and celebrated our union.

Many of you traveled so far and filled the day with so much love. We have been fortunate to see many of you through 2018 so far, and others we still plan to visit. Either way, everyone has been in our hearts and memories when we often think back to our wedding day or watch the photos go by on our digital photo frames at home. It was such a beautiful day. Thank you for being there for us to all that were able to make the hike. And to those that gave to our Worldvision cause on Crowdrise, we value your kindness and generosity. 

Since our Wedding

We’ll never forget the wedding day’s the sore cheeks from laughing, the hugs, tears and cheers of love, and tearing up the dance floor all night. Speaking of Felicity (the dancing queen that night), we can’t wait to see her, Edward, Victoria and Rob over New Years. For Christmas, we’ll be in Nantwich, England with Barb and Michael. And working backward from now, here’s a quick recap of our 2018 together:

  • We’re in Hanoi and Phu Quoc, Vietnam this week to celebrate our anniversary, seeing many friends and family, especially our dear Minh Anh and her team perform their choreographed dance in traditional Ao Dais at a school show

  • Front row at the Jay-Z and Beyonce OTR2 concert in Vancouver

  • Traipsing around Paris to revisit our old neighborhood and remember our stint living there in 2015

  • Catching up with Jeroen and Giang in Amsterdam

  • Wine tasting in the Yarra Valley, and drinking Magics in Melbourne

  • Visiting Hieu Down Under in Sydney

  • The inaugural camping trip in BC with dad Kelly and Minh Anh, and a BC wine tasting tour in the Okanagan Valley with Jill and Abhik

  • Cruising around Copenhagen, Oslo and Stockholm

  • Standing in the sakura snow of the Kyoto and Tokyo cherry blossom festival

  • Remembering and honoring dad Pham at his funeral after he sadly passed in January

  • And getting warm by the fire and sharing holiday family meals around the table with dad Kelly and Minh Anh in Vancouver

It has been another fast-paced, jet-setting year for us as my manager keeps us busy as usual, and by my manager, I mean my wife Linh. The only slow stuff has been waiting for Linh and Minh Anh to get their paperwork sorted to be able to come Stateside…something it seems will take at least until the summer of 2019.

As we proceed into the holiday season of Thanksgiving, Christmas, New Years and Lunar New Year, we are very blessed and happy to be spending time with family and friends. Thank you for being a special part of our wedding and our lives. We wish you friends and family much love and joy this season.

-James and Linh

The Tale of the Network Automator: From Knight Errant to Engineer by James Kelly


Podcast on YouTube

Perchance the temptation of technology and for certain the pangs of toil have persuaded some networkers to stray from the path of the CLI and *IE certification race.

Swinging APIs, they’ve heard, is the promised land and wielding shiny tools like Ansible can, not only confirm their chivalry, but also chase away any lament of rote configuration changes and misfortunes of fat fingers.

“Tonight, Sancho, we drink from the cup of automation. And we will be safely protected from any insurgent application issues rising to the heights of our virtuous network.”

Chapter 1: Initiation

And so once upon a time, after the ecstasy of donning the automation helmet and downloading a few horses, networkers looking for a rite of passage into the world of automation looked bravely in the mirror and said to themselves, “what is there to do that’s dangerous around here?”

The way many networkers made haste to an automation quest was ostensibly with all the strategy of the ad hoc adventures of a knight errant, looking to prove their worth.

These trailblazers were defined by the natural signposts in their workflows, by vendor toolkits, and by tools imported from the far away land of sysadmin. And so, often under the tutelage of nothing but their own trial and error, the networker knight errants learned to conquer a couple of the workflows that they were presented with, slaying manual steps one by one.

Chapter 2: Calamity

Shortly thereafter, while riding proudly toward a well-deserved weekend, the knights were so enchanted with their new weaponry and so pleased with themselves—their heads, swollen with victories, may have tightened helmets to their heads—that their attention and vision was impaired. And it was while fantasizing of their next rout that the slings and arrows started flying.

The actual assault is not important: a sudden unreachability, an impetuous line-of-business change request, a cloud VPC meltdown. It was as unexpected as it was inevitable. For some of the knights the affliction would uncover a weakness in armor. For others, it may have been the sting of the automation sword that was erratically swung and cut the wrong way, this time automating and thus swiftly amplifying an accident.

Chapter 3: Dedication

For some beginners, the setback of an automation blunder was so vexing that they turned back to peasant life. For many however, after finding their feet again and after some commiseration with knight companions, they found the strength to press on, carrying the lesson of their lapse forward to not repeat that unforgettable venomous mistake.

In knighthood, the networkers aspired to be so valiant and so adroit that they got the notion that if they’re doing it right, they won’t have difficulties. But difficulties, it turns out, are precisely the best circumstances for learning.

Ergo, the knight automators that persisted and persevered learned to be magnanimous with themselves first and foremost. To uphold future reliability, they resolved to automate around the woes that wounded them. In this way, an indiscretion, once defeated with automation, would never weasel its way back to a second sour encounter.

Chapter 4: Trials

Again and again in the hardship of automating their accomplishments, the networker knights became more astute and attuned to their environment. They began forging their own armaments and virtual squires to do their bidding. They also became less foolish and blunt on their escapades, taking what was useful from other toolsmiths and intrepid automators with software engineering skills.

In their training, the knights’ tenacity grew used to absorbing the lessons of failure, but they grew tired of the disrepute and thorns of trial and error.

As fortune would have it, one day they encountered software sages that spoke of staged replica battles. “But what is the use of repeating the past if we have integrated its teachings?” the knights inquired. “Not the past;” explained a sage, “we stage in preparation for any future change and crusade with many possibilities accounted for and tested.”  Testing and practice ahead of affronting a matter: it sounded ingenious, and so the knights too started building training grounds to rehearse their affairs.

Through testing, stressing and staging, and then automating and strengthening that preparation work too, the knights became known for their fastidious preparation and resulting dependability to conquer new projects and change conquests in production, and now with less havoc than sorely familiar from the past. Soon they became so mighty that their automation could take on more incessant action and circumstances of increasing variety.

Chapter 5: Consummation

Becoming even more zealous about automating trials not to error, the knights decided to consummate their erudition and experience by taking new identities as knight engineers.

See, each knight’s struggles and the many stories shared at the inns gradually made them less wandering and more disciplined and deliberate about their path. Their practice became more ritual and rigorous, and their automating more rewarding. Through their evolving wisdom, they had transformed from errant to engineer, worthy of an order or iron ring.

In due course, through tournaments and their own track record, the knight engineers decided they could not be passionate, proud engineers unless they could also measure and show their success. After all, they vowed reliability to others. And so, they decided to build public displays for every pledge of reliability that would maintain accurate assessments of their results. Constructed with even more automation, these displays objectively told of the current conduct and past performance of the systems in their lands.

Ultimately, so well-known these reliability signs of service became, that the ardent automation engineers are today called network reliability engineers.

Coda: The Canons of NRE

There are two discernable standards to which the knights held.

First, reliability was their utmost value. They determined that at the base of the hierarchy of needs for their service, if the network was not measurably reliable in the ways they promised, that nothing else mattered. Instead of trading off reliability with other values like agility, efficiency and so on, these boons were incidental because they first solved for reliability with automation.

Second, engineering is the best road to take as an automator. The processes and skills of software engineering guide far better than getting consumed with happened-upon technology, tools and APIs. Networking workflows are the battleground in which to practice automating and a brilliant place to start ad hoc, but ultimately the rigors of software engineering will provide strategy and structure for the tactics and tools.

For more on the culture, skills, processes, behaviors and common technologies of network reliability engineer (NRE) roles and teams, look to the famous older cousins: site reliability engineers (SRE). Check out the free online SRE book and my past blogs.


Thanks to the NREs, SREs and network automator forebears for your inspiration, enlightenment and advice in putting together the 5-step journey to automated NetOps. With that guide, may the path of forthcoming automators be smoother and more straightforward than your own story.

Cliff Notes Translation

For those unfamiliar with the famous humor and satire of the book Don Quixote, the above story and style may require some explanation. However, I suspect that those, even the slightest bit attentive to the network automation space, can see the parallel between this metaphor and real life.

Here are the Cliff Notes of the stereotypical story:

Chapter 1: Not looking before they leap into automation, many people progress in a rather ad hoc manner, tuning tribal knowledge and apparent workflows into an opportunity to learn while aggregating manual tasks. This is mostly governed by the tools they know or those put in front of them—which may or may not be the best tool for the jobs at hand.

Chapter 2: Many people get stung by automation in one way or another. Especially with config management automation, one small issue could easily get propagated to a massive blast radius. Change workflows aren’t the safest place to learn, but they are the most infamous for some reason. Automating directly in production when learning and without testing is also a catastrophe waiting to happen.

Chapter 3: Continuous improvement and learning is always the foundation of good automation. People learn that small changes and sometimes small mistakes are where the lessons lay to help discover what to automate next. This can be taken all the way to NRE concepts like chaos engineering, where failure is induced on purpose.

Chapter 4: The rigors of software engineering like test-driven engineering, continuous integration and delivery (CICD) and automated deployment are critical to success and safely moving quickly, instead of trading off reliability for velocity. Automation is learned in pre-production and environments like labs or virtual labs. Over time, engineers also create well-tested in-production continuous response.

Chapter 5: Instead of relying on only general monitoring tools, NREs, like SREs, engineer what matters most: service-level indicators to objectively measure success of higher-order service goals and promises. With the help of data, they manage their way to truly measured continuous improvement.

Podcast on YouTube

image credit blitzmaerker/pixabay