Software-Defined, a Decade Later by James Kelly


The first African-American president had been sworn in, the king of pop had just passed, @realDonaldTrump had taken to tweeting, and Avatar was about to smash box-office records, but what I remember about the summer of 2009 is the dawn of software-defined.

Looking forward to tapas and sunglasses, I landed in Barcelona to present at SIGCOMM on what was then a niche topic: network programmability.

I was feeling good about my presentation on Service Creation with the Junos SDK (now known as JET). I’d trained many people on it, so I knew everything from its use cases to APIs like the back of my hand. Gung-ho, as I shook hands with people in the room, I was greeted with enthusiastic faces.

A decade later, I don’t remember the talk. I do remember its Q&A. Raised hands eclipsed my talk with many questions comparing our SDK to a similar demo and presentation from the day before that I had missed. Well I’ll be damned. I had followed—you guessed it—OpenFlow.

For me that was SDN’s big bang moment. The ensuing unbridled enthusiasm for OpenFlow percolated up from this academic setting into a frenzy of new foundations and projects, controllers, APIs, and a smorgasbord of overlay and control protocols. As the networking industry was soon flush with SDN startups, many established players SDN-washed anything that resembled software, and soon that spread to a software-defined movement that impacted all things infrastructure.

Back then, “en primeur” SDN looked like a dicey proposition, but its preteen years have seen it mature in the data center and WAN. Today, software-defined is almost a norm, expected to be poured into all places in network, to keep up with the operational reliability and speed demanded of these times of cloud-charged disruption.

As multicloud infrastructure, the new IT platform, permits and needs greater automation, digital operations teams are impelled to a new status quo: software-defined and AI-driven. While we still hear of SDDC, SD-WAN and SD-Branch, expectations today are that all places in network are ruled with software. And soon, if not already, that software must be AI-fueled for smarter AIOps and reliability engineered like DevOps.

Putting aside expectations and buzzwords, let’s objectively look at how SDN is moving along. 2019’s litmus test is a far cry from the old standby of control and data plane separation.

How is SDN Evolving in the Open?

It’s impossible to write an account of SDN without giving meaningful consideration to openness.

In some enterprise engineering teams, closed and proprietary solutions are contraband, but when it comes to SDN, there few open-source projects simple enough to operationalize right off the shelf of GitHub or DockerHub. Commercially available products still carry the day, but a predilection for open source hangs heavy in the air as it brings the benefits of a community for discussion and sharing of automated workflow tools, frameworks, tests and playbooks.

While open source can introduce economies of multi-vendor engineering, testing and co-creation with code-savvy customers, it’s not replacing open standards. If such a replacement is looming, it’s narrowly in the cloud-native space of securely connecting Kubernetes application clusters and service meshes. But where SDN meets hardware top to bottom, and where multiple SDNs meet to federate and span domain boundaries end to end, interoperability isn’t yet forged in open source. Multi-vendor interoperability and multi-SDN system federation hinges on open standards-based protocols, especially those truly proven and widespread standards.

Interoperability prevents technical debt and increases customer freedom. However, customer freedom is an unlikely strategy for the largest vendor incumbents, which is why we still see some rival SDNs bound to vendor hardware and vendor workload orchestration systems.

Interoperability should also not be confused with overlays which are merely agnostic to what IP network on which they ride. Overlays provide separation of concerns, and like interoperability, they afford a way to insert into brownfield environments. But without interoperable network boundaries, overlay SDN solutions are islands eventually destined as technical debt, the ball and chain to IT progress.

Security is Shifting Left

SDN openness may be in various shades of optional, but everyone understands security is a must because from the network engineer to the board of directors, people today are acutely aware of the specter of security breaches and attacks.

Whether it is multi-tenancy, micro-segmentation, mutual identity authentication, or secure SD-WAN based on next-gen firewalls, in software development and network development security has shifted left, moving it earlier on the timeline of project considerations. Infosec used to be a rubber stamp at the end of projects, now security is foundational and a first priority.

With the progression of SDN, instead of only traditional secure perimeters, network security is now getting measured out in course- and fine-grained means for multi-factor defense in depth. And in the race against advanced and automated threats, software-defined systems have also made it simpler to manage security policies and automatically enforce them by applying protection within the network faster than ever before.

How is SDN Evolving Automation?

In the aftermath of the early SDN hype, the industry experienced progression on the front of orchestrating operations and regression from the focus on automating them.

The evolution of controllers made automation turnkey, focusing on what instead of how. In data centers, the dynamic machinery inside of the network was automated in step with workflow orchestrators like vSphere, OpenStack, and today, more so Kubernetes and OpenShift. In the case of SD-WAN, orchestration meant dealing in zero-touch branch onboarding and choosing WAN policy levels for application traffic. Automation to regulate traffic steering across the hybrid WAN uplinks was baked in.

These are oversimplifications of SDN applications, but across SDN domains, the common thread is that 1. control is centralized and abstracted above the device level to span a distributed fleet of infrastructure; and 2. controllers automate workflows, orchestrating, reducing and simplifying steps. The industry attention spent adapting to software-defined has, until recently, been mostly about these benefits of moving from commanding device CLIs to orchestrating controller GUIs. But orchestration implies automated networks, and changes nothing about automated network-ing. Automated network operations (NetOps) is the next frontier.

While workflows are leveled up and centralized with SDN, so are workflow APIs, another key aspect of SDN systems. With this, there is also the possibility to automate NetOps: evolving to automated testing, troubleshooting, change controls, service-level indicators, and service-level reliability. This is an area of network automation focused less on the automation technology inside software-defined networks, and more on the processes and people that will software-define operations.

In this transformational movement that network engineers have dubbed network reliability engineering (NRE), a subject we at Juniper have promoted a good deal, technology also plays an important role. SDN systems are important so that engineers are building on SDN abstractions and workflows and doing so with centralized APIs instead of down at each device.

Analytics and AI Are Driving the Future of Automation

Big data analytics, machine learning and AI are certainly sensationalized, but that’s because they show promise. If they can be applied to customer trends, voter preferences, and many other fields with complex data points, they can also be applied to IT. A prerequisite to this is already found in SDN systems: a central point of management. And networks are teeming with data to be centrally collected and processed.

Many SDN systems already incorporate telemetry, analytics and some of them leverage AI. If they don’t, you can bet it’s on the roadmap. And analytics and AI are important ingredients in automation, both within SDN systems and for improving the operations-contextual automated NetOps on top of SDN.

Since Juniper just acquired Mist, a timely example is the AI-driven Mist Cloud. Combining data points from Wi-Fi and BLE radio, wired, and wide-area networks, the Mist Cloud software uses AI to crunch location-addled degradations and dynamics. It’s able to stabilize network reliability and raise user experience problems, even outside of the WLAN. On the NetOps side of things, Mist’s AI-powered assistant, Marvis, smartens up the human touch and mitigates mistakes.

Another example is found in network reliability engineering. In the data-driven SRE and NRE culture, you can’t manage what you can’t measure. Many SDN systems have higher-order reporting, metrics and centralized telemetry APIs, aiding in the creation of service-level indicators (SLIs). SLIs are important because they map to objectives about the stakeholder-oriented service levels instead of SNMP or metrics that are meaningless to those outside of networking.

NREs also manage errors and outages by automating around them for continuous improvement. As SDN systems begin to incorporate AI-powered predictive analytics, NREs can use those signals to better uphold service levels with automated proactive steps to complement reactive remediation.

Clouds are in the Forecast

The cloud is replete with as-a-service IT offerings, but many SDN systems are still “shrink wrapped software” as downloads to run on premises or in private clouds.

Many networking teams would like to use or at least evaluate SDN, but jumping headlong into unboxed software like SDN is not accessible to smaller networking teams when they have the added work of learning to run and maintain SDN systems. For this reason, cloud-delivered SDN offerings are kindling more SDN adopters. This week, Juniper announced that it is taking its SD-WAN solution and launching the option for Contrail SD-WAN as a service.

Cloud-based SDN for all domains of networking seems interesting but won’t be right for all customers nor all use cases. Cloud-based data collection comes with mixed benefits and drawbacks.

For some, sending telemetry data to the cloud may be prohibited or prohibitively inefficient, especially in cases where certain telemetry requires a quick reaction. On the other hand, to the cloud’s benefit, if many networking teams don’t have the means to operate SDN, then they surely don’t have the means to run sophisticated analytics or AI systems on premises. A cloud service solves for simplicity. It also solves for quicker vendor innovation cycles and smarter processing because the economies of scale for data storage, analytics and now hardware-assisted AI processing based in the cloud are enormous and incomparable.

In summary, the first software-defined decade has been spectacular, and seasoned solutions are seeing a lot of production. Now, with so much contributing to SDN’s innovation and reinvention, I look forward to the next odyssey and seeing what will define software-defined. Drop a comment below to share your own anticipations.

Podcast of this blog on YouTube.

image credit moritz320/pixabay

Serverless Containers Intensify Secure Networking Requirements by James Kelly


When you’re off to the races with Kubernetes, the first order of business as a developer is figuring out a micro-services architecture and your DevOps pipeline to build pods. However, if you are the Kubernetes cluster I&O pro, also known as site reliability engineer (SRE), then your first order of business is figuring out Kubernetes itself, as the cluster becomes the computer for pod-packaged applications. One of the things a cluster SRE deals with—even for managed Kubernetes offerings like GKE—is the cluster infrastructure: servers, VMs or IaaS. These servers are known as the Kubernetes nodes.

The Pursuit of Efficiency

When you get into higher-order Kubernetes there are a few things you chase.

First, multi-purpose clusters can squeeze out more efficiency of the underlying server resources and your SRE time. By multi-purpose cluster, I mean running a multitude of applications, projects, teams/tenants, and devops pipeline stages (dev/test, build/bake, staging, production), all on the same cluster.

When you’re new to Kubernetes, such dimensions are often created on separate clusters, per project, per team, etc. As your K8s journey matures though, there is only so long you can ignore the waste this causes in the underlying server-resource capacity. Across your multicloud, consolidating many clusters into as few as practical for your reliability constraints also saves you time and less swivel-chairing for: patching, cluster upgrades, secrets and artifact distribution, compliance, monitoring, and more.

Second, there’s the constant chase of scaling efficiency. Kubernetes and active monitoring agents can help take care of auto-scaling individual micro-services, but scaling out assumes you have capacity in your cluster nodes. Especially if you’re running your cluster atop IaaS, it’s actually wasteful to maintain—and pay for—extra capacity in spare VM instances. You probably need some buffer because spinning up VMs is much slower than for containers and pods. Dynamically right-sizing your cluster is quite the predicament, particularly as it becomes more multi-purpose.

True CaaS: Serverless Containers

When it comes to right-sizing your cluster scale, while the cloud providers are happy to take your money for extra VMs powering your spare node capacity, they do have a better solution. At Re:Invent 2017, AWS announced Fargate to abstract away the servers underneath your cluster. Eventually it should support EKS in addition to ECS. In the meantime, Azure Container Instances (ACI) is a true Kubernetes-pods as a service offering that frees you from worrying about the server group upon which it’s running.

Higher-order Kubernetes SRE

While at Networking Field Day 17 (NFD17 video recording), I presented on “shifting left” your networking and security considerations to deal with DevOps and multi-purpose clusters. It turns out that on the same day, Software Engineering Daily released their Serverless Containers podcast. In listening to it you’ll realize that such serverless container stacks are probably the epitome of multi-purpose Kubernetes clusters.

What cloud providers offer in terms of separation of concerns with serverless container stacks, great cluster SREs will also aim to provide to the developers they support.

When you get to this level of maturity in Kubernetes operations, you’re thinking about a lot of things that you may not have originally considered. This happens in many areas, but certainly in networking and security. Hence me talking about “shift left,” so you can prepare to meet certain challenges that you otherwise wouldn’t see if you’re just getting Kubernetes up and running (great book by that name).

In the domain of open networking and security, there is no project that approaches the scalability and maturity of OpenContrail. You may have heard of the immortal moment, at least in the community, when AT&T chose it to run their 100+ clouds, some of enormous size. Riot Games has also blogged about how it underpins their DevOps and container runtime environments for League of Legends, one of the hugest online games around.

Cloud-grade Networking and Security

For cluster multi-tenancy, it goes without saying that it’s useful to have multi-tenant networking and security like OpenContrail provides. You can hack together isolation boundaries with access policies in simpler SDN systems (indeed, today, more popular due to their simplicity), but actually having multi-tenant domain and project isolation in your SDN system is far more elegant, scalable and sane. It’s a cleaner hierarchy to contain virtual network designs, IP address management, network policy and stateful security policy.

The other topic I covered at NFD17 is the goal of making networking and security more invisible to the cluster SRE and certainly to the developer, but providing plenty of control and visibility to the security and network reliability engineers (NREs) or NetOps/SecOps pros. OpenContrail helps here in two crucial ways.

First, virtual network overlays are a first-class concept and object. This is very useful for your DevOps pipeline because you can create exactly the same networking and security environment for your staging and production deployments (here’s how Riot does it). Landmines lurk when staging and production aren’t really the same, but with OpenContrail you can easily have exactly the same IP subnets, addresses, networking and security policies. This is impossible and impractical to do without overlays. You may also perceive that overlays are themselves a healthy separation of concerns from the underlay transport network. That’s true, and they easily enable you to use OpenContrail across the multicloud on any infrastructure. You can even nest OpenContrail inside of lower-layer OpenContrail overlays, although for OpenStack underlays, it provides ways to collapse such layers too.

Second, OpenContrail can secure applications on Kubernetes with better invisibility to your developers—and transparency to SecOps. Today, a CNI provider for Kubernetes implements pod connectivity and usually NetworkPolicy objects. OpenContrail does this too, and much more that other CNI providers cannot. But do you really want to require your developers to write Kubernetes NetworkPolicy objects to blacklist-whitelist the inter-micro-service access across application tiers, DevOps stages, namespaces, etc? I’d love to say security is shifting left into developers’ minds, and that they’ll get this right, but realistically when they have to write code, tests, fixes, documentation and more, why not take this off their plates? With OpenContrail you can easily implement security policies that are outside of Kubernetes and outside of the developers’ purview. I think that’s a good idea for the sanity of developers, but also to solve growing security complexity in multi-purpose clusters.

If you’ve made it this far, I hope you won’t be willfully blind to the Kubernetes SRE-fu you’ll need sooner or later. Definitely give OpenContrail a try for your K8s networking-security needs. The community has recently made it much more accessible to quick-start with Helm packaging, and the work continues to make day-1 as easy as possible. The Slack team is also helpful. The good news is that with the OpenContrail project very battle tested and going on 5 years old, your day-N should be smooth and steady.

PS. OpenContrail will soon be joining Linux Foundation Networking, and likely renamed, making this article a vestige of early SDN and cloud-native antiquity.

image credit alexandersonscc / Pixabay

#SimpleAF Networking in 2018 by James Kelly


Last week at its flagship customer event, NXTWORK, Juniper Networks set a valiant vision for its role in the future of networking: “Engineering. Simplicity.” Next week as we take respite from engineering and set some 2018 goals, simplicity sounds good. Here are some of my ideas inspired by engineering for simplicity and Juniper’s new #simpleAF tag. Simple as fishsticks…of course.

Da Vinci called simplicity: ultimate sophistication. It would come more easily to those solving more primitive challenges, but maybe you, like Juniper, audaciously tackle cloud-grade problems, and in such domains and at such scale, simplicity is anything but simple to achieve.

The thing about creating #simpleAF simplicity is that it’s not just about tools or products. If you’re a  network operator, another tool won’t make a revolutionary nor lasting impact toward simplifying your life any more than the momentary joy of a holiday gift. Big impacts and life changes start inside out. They don’t happen have-do-be, rather they are be-do-have. Juniper is doing its own work to put simplicity at its company’s core being, but this article, besides some gratuitous predictions, is about transforming your network operations, putting simplicity at your core for a happier prosperous 2018.

Be… the NetOps Change

To be the engineer of network simplicity, it’s time to lose the title of network admin, operator or architect and embrace a new identity as a:

Network Reliability Engineer

Just like sysadmins have graduated from technicians to technologists as SREs, the NRE title is a declaration of a new culture and serves as the zenith for all that we do and have as engineers of network invincibility. And where could invincibility be more important than at the base of the infrastructure that connects everything.

Start with this bold title, and fake it until you make it who you truly are.

Do… It Like They Do on the DevOps Channel

Just like SREs describe their practices as DevOps, network reliability engineering embraces DevNetOps.

While Dev and (app) Ops are working closely together atop cloud-native infrastructures like Kubernetes, the cluster SRE is the crucial ops role that creates operational simplicity by design of separation of concerns. Similarly, the NRE can design simplicity by offering up an API contract to the network for its consumers—probably the IaaS and cluster SREs in fact.

The lesson here is that boundaries are healthy. Separate concerns by APIs and SLA contracts to deliver simplicity to yourself as well as up the chain, whether that is to another overlay network or another kind of orchestration system.

It’s important that your foundational network layers achieve simplicity and elegance too. Trying to put an abstraction layer around and over top of a hot mess is a gift to no one, least of all yourself if it’s your mess.

So how do we clean up the painful mess that is the job of operating networks today? I’ve examined borrowing the good habits of SREs and DevOps pros before, in the shape of DevNetOps blogs and slideshares. Here is a quick recap of good behaviors to move you from the naughty to the nice list, along with my predictions for 2018.

1.     Start with Networks as Code
Prediction: When people say the network CLI is  dead, they jump straight to thinking about GUIs. Well for provisioning changes, you ought to start thinking about Eclipse Che or an IDE instead of product GUIs, and start thinking about GUIs more for observability and management by metrics. Networks as code start with good coding logistics. This coming year, I think we’ll see DevNetOpserati practice this simple truth and realize the harder one that networks as code is better on top of automated networking systems themselves. In other words, networking and config as code belong on top of SDN “intent-driven” systems, not box-by-box configurations in your github repo; nevertheless, that may be a good step depending on where you’re at in your journey.

2.     Orchestrate Your Timeline as a Pipeline
Prediction: It follows from coding habits that you follow a review and testing process for continuous integration (CI). While vendors dabble in testing automation services and simulation tools, I predict that we will see more of a focus on these in 2018 and opinionated tools that orchestrate the process workflow of CI and eventually continuous delivery/deployment (CD). This is whitespace for vendor offerings right now, and the task is ripe for truly upleveling operator simplicity with process innovation. While the industry talks about automation systems like event-driven infrastructure (EDI) and continuous response, mature DevOps tooling is ready and waiting for DevNetOps CI process pipeline automation. Furthermore, it’s more accessible in terms of codifying or programming with a higher reliability ROI.

3.     Micro and Immutable Architecture
Prediction: 2017 was the year everyone went koo koo for Kubernetes and containers. It has mainstream adoption for many kinds of applications, but networking systems from OSs to SDNs to VNFs are all lagging on the curve of refactoring into containers. I’ve reported on how finer-grained architectures are the most felicitous for DevOps, DevNetOps, and the software and hardware transformation that networking needs in order to achieve the agility of CD. We must properly architect before we automate. We’ve started to see containers hit some SDN systems in 2017, but I predict 2018 will be the year we start to see VNF waypoints as containers with multi-interface support in CNI and a maturation of higher-order networking in the ruling orchestrator, Kubernetes.

4.     Orchestrated CD from Your Pipeline
Prediction: I’m doubtful that we’ll hit this mark in 2018 in the area of DevNetOps, mostly because of some of the above prerequisites. On the flip side, it’s obvious that in 2018 we are going to see the network play a big role in micro- services CD thanks to the rise of service meshes that do a much better job of canary and blue-green deployments. CD for actual networking systems is likely experimental in 2018 or bleeding edge for some SDN systems.

5.     Resiliency Design and Drills
Prediction: This is the fun practice of chaos engineering, designing and automating around failures to stave off black-swan catastrophes. This design is already showing up in preference for more smaller boxes instead of fewer larger boxes. Most enterprises are also getting around to SD-WAN to use simultaneous hybrid WAN connectivity options. There is more to do here in terms of tooling and testing drills in CI pipelines, as well as embracing the “sadistic” side of the NRE culture that kills things for fun to measure and plan for invincibility or automatic recovery. In 2018, I think this will continue to focus on evolving availability designs until we make more progress in areas 2 and 3 above.

6.     Continuous Measurement
Prediction: The SRE and NRE culture is one of management by metrics; thus, analytics are imperative. There is always progress happening in the area of telemetry and analytics, and I know at Juniper we continue to push forward OpenNTI, JTI and AppFormix. 2018 beckons with the opportunity to do more with collection systems like Prometheus and AppFormix, employing narrow-AI deep learning for the first time to raise new insights beyond normal human observation.

7.     Continuous Response
Prediction: With the 2017 rage around intent-based or -driven networking, there is sure to be progress in 2018. While the past few years have focused on EDI, I think the most useful EDI actions are largely poised to get subsumed into SDN systems with controllers in them, similar to how Kubernetes’ controllers implement the continuous observation and correction to the declared state. There is however a tribe of NREs that look to automate across networking systems and other IT systems like incident management, ticketing and ChatOps tools. As I wrote about in May, I think the maturing serverless or FaaS space will eventually win as the right platform for these custom EDI actions.

Have… a Happier Holiday

As an NRE, it’s not just about doing DevNetOps behaviors or processes nor is it about having the greatest tools or code. When you’re network reliability engineering for simplicity, it’s equally about what’s really important when you take the NRE cape off and go home: not getting called in and enjoying a happier holiday. And so, simplicity is what I wish for you this holiday season, and for the next one, may you be further down the road of engineering simplicity.

image credit Manuchi/Pixabay