Serverless Can Smarten Up Your DevOps & Infrastructure Stack by James Kelly

As IT organizations and cultures reshape themselves for speed, we’ve seen a trend away from IT as a service provider, to IT as:

  • a collection of smaller micro-service provider teams,
  • each operating and measuring like autonomous (micro-)product teams,
  • each building and running technology systems as micro-services.

Now you’ve heard me cover the intersection of microservices and containers quite a bit, along with CaaS/Kubernetes or PaaS/OpenShift orchestration. In this blog, I’m going to change it up a bit and talk about the other kind of microservices building block: code functions.

Here I’m talking about microservices run atop of a serverless computing a.k.a. function-as-a-service (FaaS) stack. Serverless is a 2-years-new yet already well-covered topic. If you need a primer this one is the best I’ve found.

In my last blog, I examined avoiding cloud lock-in by bringing your own devops and software-defined infrastructure stack: bringing your own open-source/standard toolchain across public/private IaaS, you maintain portability and harmonized, if not unified, hybrid cloud policy. This software-defined infrastructure stack, underpinning the user- or business-facing applications, is also still just a bunch of software applications. FaaS is just one example of an increasingly popular app-developer technology that can be employed by infrastructure developers too, though this one hasn’t gotten much attention yet in that regard.

If you read the primer above, you won’t have learned that the popularity of FaaS systems like Lambda to reduce developer friction has led to a number of open source projects like OpenWhisk, and others that to sit atop of a CaaS/PaaS (such as Fission or Kubeless). If a FaaS is likely to exist in your application stack, it may well make sense to standardize on incorporating one into your devops and infrastructure stack. Let’s see.


When it comes to I&O events, does your mind go to a pager waking somebody up in the middle of the night? Better to wake up a function and keep sleeping.

The main area of natural applicability for FaaS is in event processing. Such code functions after all are event handlers.

In the devops world, for development and application lifecycle events, FaaS presents a simple way to further automate. The scale is probably low for most CI/CD events, so the main benefit here of employing FaaS is the simplicity and agility in your pipelines as code. It also helps foster the change and process innovation to do pipelines as code in the first place. In the devops area of continuous response (CR), that Randy Bias coins here, there is almost certainly more scale and importance to respond to events, where the property of FaaS to handle on-demand bursty scale could also be a significant benefit.

In the software-defined infrastructure world, there are traditional events such as change logs, alarms, security alerts, and modern telemetry and analytics events. Of all the types, analytics, telemetry and security events strike me as the area where FaaS could present the biggest benefit and most interesting examination here. Similar to how for CR, analytics for an application (processing telemetry such as API calls, clicks, and other user events) needs scale, software-defined infrastructure is increasingly generating massive amounts of telemetry data that needs processing.

Systems like Juniper Networks AppFormix and Contrail Networking already do some telemetry collection and processing (see for yourself). AppFormix, a platform for smart operations management, in fact, processes much of the telemetry with light machine learning on the edge node on which it’s generated, so that sending the big data for centralized heavy-duty crunching and machine learning in the centralized control plane is more efficient. We at Juniper have written before about this distributed processing model and its benefits such as realer real-time visibility monitoring to show you what’s happening because… the devops call, “You build it, you run it,” sounds great until you’re running in a multi-tenant/team/app environment and chaos ensues. AppFormix also has also a smart policy engine, that is using static thresholds as well as anomaly detection to generate... you guessed it... events!

Just like devops event hooks can trigger a FaaS function, so can these software-defined infrastructure events.

Obviously a system like AppFormix implements much of the CR machine-learning smart glue for the analytics of your stack and its workloads; smart-driving ops as I call it. Other systems like Contrail Networking with its analytics Kafka or API interface are rawer. The same is true of other SD-infrastructure components like raw Kubernetes watcher events with which some FaaS systems like Fission already integrate.

The more raw the events, I bet, the more frequent they are too, so you have to decide at what point the frequency becomes so high that it’s more efficient to run a full-time process instead of functions on demand. That depends on the scale with which you want to run your own FaaS tool, or use a public cloud FaaS service (of course that only makes sense if you’re running the stack on that public cloud, and want to tradeoff some lock-in until they mature as usable in a standard way…worth seeing Serverless Framework here).


One of the ends that you could have in mind to run operational-based functions is greater operational intelligence. As we’ve discussed, AppFormix does this to a good extent, but it does it within a cluster. We can consider further smart automation use cases in-cluster, but using a FaaS external to your clusters, your infrastructure developers could scope your intelligence deeper or broader too.

For example, you could go deeper, processing telemetry in customized ways that AppFormix does not today. Imagine you’re running on GCP. Well you could do deep learning on telemetry data, even with TPUs I might add. To borrow a metaphor from a common ML example of image recognition, where AppFormix might be able to tell you, “I see a bird,” a powerful deep learning engine with TPU speed up, may, in real-enough time, be able to tell you, “I see a peregrine falcon.” In more practical terms, you may be able to go from detecting anomalies or security alerts to operations and security situational awareness where your systems can run themselves better than humans. After all, they already can at lower levels thanks to systems like Kubernetes, Marathon and AppFormix.

Rather than deeper, broader examples could be use cases that extend beyond your cluster. For example, auto-scaling or auto-healing of the cluster by integrating with physical DCIM systems or underlying IaaS. Yes, it’s serverless managing your servers folks!

In closing, there is plenty more to be written in the future of serverless. I expect we’ll see something soon in the CNCF, and then we’ll definitely start hearing of more application use cases. But hopefully, this blog sparked an idea or two about its usefulness beyond the frequent http-request and API-callback use cases for high-level apps. (Please share your ideas). For intent-driven, self-driving, smart-driving infrastructure, I think a FaaS track is emerging.

What is Open Good For? by James Kelly

Last week in a conversation with a customer I was asked about the differences between ODL and OpenContrail and if NorthStar was going open source as well. That discussion seemed interesting for a blog on approaching “open” networking, especially given we’ve just come thru the Open Networking Summit earlier this month.

Open can manifest in a lot of ways. In my mind, chiefly, as open source, open standards, and open interfaces. Personally “open interfaces (APIs)” seems like a misnomer to me, but it seems too late to change.

Open standards have always been the underpinning of network component interoperability. The degree to which networks are interoperable is the degree to which they can peer and grow into larger networks, so this is vital for networks in a world with explosive desire for connectedness of people and things all the time and everywhere.

In developing OpenContrail and NorthStar, you’ll notice the implementation bias toward using proven and existing open standard protocols like MBGP and others, rather than employing new protocols like OpenFlow for example. This is important because we’re building platforms AND products. Customers buying the product care about investment protection and an evolution from where they are to the future they will realize as they roll out the product more and more. Notice how this is at odds with pure technologists’ desire to optimize the technology, regardless of whether reinvention is necessary or not in their view. When building products, we simply must incorporate a bias to evolution over revolution.

Open source is another topic with its virtues and its own problems. The ecosystem and community built around a cause is wonderful for innovation. Projects like Linux and OpenStack offer customers greater options of which vendor they want to use for the commercial support, and with some flexibility to change between them and leverage innovations across them easier than wholly closed-source vendor solutions.  On the side of open source challenges, these ecosystems can tend to be driven by technologists and they don’t necessarily rally around specific customer business problems. It takes vendors to best focus on the customer, and with that does come some lock-in. Open source doesn’t necessarily mitigate all vendor lock-in as some would tout, but open (common) interfaces, a frequent byproduct of open source projects, does help in this regard.

Open interfaces help to create isolation in layers of the software stack; thus, we get the option to switch out and recompose the layers and that option truly lowers vendor lock-in. Interfaces effectively decouple. Decoupling can also be built into systems by choice. For example, Juniper designed OpenContrail to not rely at all on the physical network except IP reachability. That creates true freedom for the customer, breaking an area where lock-in may otherwise set in.

Are all these axes of openness needed or best applied universally? Probably not. When you look at cloud infrastructure, open source is trending strongly. The ETSI NFV ISG is even creating its reference architectures on open source infrastructure, and in genernal, trying to keep IT up-to-speed with the public clouds, Web 2.0s, and hyperscale enterprises demands keeping up to the speed of innovation that an open source ecosystem provides. OpenContrail fills this demand perfectly, coming to the open source cloud infrastructure party with a lot to offer because it was incubated to maturity before launching. When it comes to applying SDN paradigms in the WAN (also read backbone/core), this open source party simply doesn’t exist. Therefore, I don’t see a need to open source Juniper’s NorthStar controller. Incubating it in-house until its later release, we will ensure we align all of our internal technologists (Juniper is known for wickedly smart and passionate engineers) in the direction of customer needs.

I know I still haven’t answered with respect to OpenDaylight (ODL) vs. OpenContrail and NorthStar. There are a bunch of things to say like how Juniper Contrail used OpenContrail code exactly (not just a a basis to hack into a product), but my favorite part is my car analogy. OpenDayight is positioned as platform for SDN and innovation, but these are very broad goals. SDN and innovation are an umbrella across networking domains, but for totally separate solution areas like high-IQ WAN networks and cloud infrastructure for virtualized data centers and the service-enabled SP edge, we are best served by separate but complementary products. As I said to our customer, “If you had an F1 race on the track and also an off-road race on a course with obstacles, then you would take separate cars, that is, if you want to win.”

This was originally published on the Juniper Networks blog and ported to my new blog here for posterity.