Serverless Can Smarten Up Your DevOps & Infrastructure Stack / May 24, 2017 by James Kelly

As IT organizations and cultures reshape themselves for speed, we’ve seen a trend away from IT as a service provider, to IT as:

a collection of smaller micro-service provider teams,
each operating and measuring like autonomous (micro-)product teams,
each building and running technology systems as micro-services.

Now you’ve heard me cover the intersection of microservices and containers quite a bit, along with CaaS/Kubernetes or PaaS/OpenShift orchestration. In this blog, I’m going to change it up a bit and talk about the other kind of microservices building block: code functions.

Here I’m talking about microservices run atop of a serverless computing a.k.a. function-as-a-service (FaaS) stack. Serverless is a 2-years-new yet already well-covered topic. If you need a primer this one is the best I’ve found.

In my last blog, I examined avoiding cloud lock-in by bringing your own devops and software-defined infrastructure stack: bringing your own open-source/standard toolchain across public/private IaaS, you maintain portability and harmonized, if not unified, hybrid cloud policy. This software-defined infrastructure stack, underpinning the user- or business-facing applications, is also still just a bunch of software applications. FaaS is just one example of an increasingly popular app-developer technology that can be employed by infrastructure developers too, though this one hasn’t gotten much attention yet in that regard.

If you read the primer above, you won’t have learned that the popularity of FaaS systems like Lambda to reduce developer friction has led to a number of open source projects like OpenWhisk, and others that to sit atop of a CaaS/PaaS (such as Fission or Kubeless). If a FaaS is likely to exist in your application stack, it may well make sense to standardize on incorporating one into your devops and infrastructure stack. Let’s see.

I&O EVENTS

When it comes to I&O events, does your mind go to a pager waking somebody up in the middle of the night? Better to wake up a function and keep sleeping.

The main area of natural applicability for FaaS is in event processing. Such code functions after all are event handlers.

In the devops world, for development and application lifecycle events, FaaS presents a simple way to further automate. The scale is probably low for most CI/CD events, so the main benefit here of employing FaaS is the simplicity and agility in your pipelines as code. It also helps foster the change and process innovation to do pipelines as code in the first place. In the devops area of continuous response (CR), that Randy Bias coins here, there is almost certainly more scale and importance to respond to events, where the property of FaaS to handle on-demand bursty scale could also be a significant benefit.

In the software-defined infrastructure world, there are traditional events such as change logs, alarms, security alerts, and modern telemetry and analytics events. Of all the types, analytics, telemetry and security events strike me as the area where FaaS could present the biggest benefit and most interesting examination here. Similar to how for CR, analytics for an application (processing telemetry such as API calls, clicks, and other user events) needs scale, software-defined infrastructure is increasingly generating massive amounts of telemetry data that needs processing.

Systems like Juniper Networks AppFormix and Contrail Networking already do some telemetry collection and processing (see for yourself). AppFormix, a platform for smart operations management, in fact, processes much of the telemetry with light machine learning on the edge node on which it’s generated, so that sending the big data for centralized heavy-duty crunching and machine learning in the centralized control plane is more efficient. We at Juniper have written before about this distributed processing model and its benefits such as realer real-time visibility monitoring to show you what’s happening because… the devops call, “You build it, you run it,” sounds great until you’re running in a multi-tenant/team/app environment and chaos ensues. AppFormix also has also a smart policy engine, that is using static thresholds as well as anomaly detection to generate... you guessed it... events!

Just like devops event hooks can trigger a FaaS function, so can these software-defined infrastructure events.

Obviously a system like AppFormix implements much of the CR machine-learning smart glue for the analytics of your stack and its workloads; smart-driving ops as I call it. Other systems like Contrail Networking with its analytics Kafka or API interface are rawer. The same is true of other SD-infrastructure components like raw Kubernetes watcher events with which some FaaS systems like Fission already integrate.

The more raw the events, I bet, the more frequent they are too, so you have to decide at what point the frequency becomes so high that it’s more efficient to run a full-time process instead of functions on demand. That depends on the scale with which you want to run your own FaaS tool, or use a public cloud FaaS service (of course that only makes sense if you’re running the stack on that public cloud, and want to tradeoff some lock-in until they mature as usable in a standard way…worth seeing Serverless Framework here).

TO WHAT END?

One of the ends that you could have in mind to run operational-based functions is greater operational intelligence. As we’ve discussed, AppFormix does this to a good extent, but it does it within a cluster. We can consider further smart automation use cases in-cluster, but using a FaaS external to your clusters, your infrastructure developers could scope your intelligence deeper or broader too.

For example, you could go deeper, processing telemetry in customized ways that AppFormix does not today. Imagine you’re running on GCP. Well you could do deep learning on telemetry data, even with TPUs I might add. To borrow a metaphor from a common ML example of image recognition, where AppFormix might be able to tell you, “I see a bird,” a powerful deep learning engine with TPU speed up, may, in real-enough time, be able to tell you, “I see a peregrine falcon.” In more practical terms, you may be able to go from detecting anomalies or security alerts to operations and security situational awareness where your systems can run themselves better than humans. After all, they already can at lower levels thanks to systems like Kubernetes, Marathon and AppFormix.

Rather than deeper, broader examples could be use cases that extend beyond your cluster. For example, auto-scaling or auto-healing of the cluster by integrating with physical DCIM systems or underlying IaaS. Yes, it’s serverless managing your servers folks!

In closing, there is plenty more to be written in the future of serverless. I expect we’ll see something soon in the CNCF, and then we’ll definitely start hearing of more application use cases. But hopefully, this blog sparked an idea or two about its usefulness beyond the frequent http-request and API-callback use cases for high-level apps. (Please share your ideas). For intent-driven, self-driving, smart-driving infrastructure, I think a FaaS track is emerging.

A Contrarian Viewpoint on Container Networking / April 14, 2017 by James Kelly

With DockerCon in Austin happening this week, I’m reminded of last year’s DockerCon Seattle, and watching some announcements with utter fascination or utter disappointment. Let’s see if we can’t turn the disappointments into positives.

The first disappointment has a recent happy ending. It was a broadly shared observation in Seattle, the media, and discussion forums: Docker was overstepping when they bundled in Swarm with the core of Docker Engine in 1.12. This led to the trial balloon that forking Docker was a potential solution towards a lighter-weight Docker that could serve in Mesos and Kubernetes too. Last September I covered this in my blog sparing little disdain over the idea of forking Docker Engine simply because it had become too monolithic. There are other good options, and I’m happy to say Docker heeded the community’s outcry and cleanly broke out a component of Docker Engine called containerd which is the crux of the container runtime. This gets back to the elegant Unix-tool inspired modularization and composition, and I’m glad to see containerd and rkt have recently been accepted into the CNCF. Crisis #1 averted.

My next disappointment was not so widely shared, and in fact it is still a problem at large today: the viewpoint on container networking. Let’s have a look.

Is container networking special?

When it comes to containers, there’s a massive outpour of innovation from both mature vendors and startups alike. When it comes to SDN there’s no exception.

Many of these solutions you can discount because, as I said in my last blog, as shiny and interesting as they may be on the surface or in the community, their simplicity is quickly discovered as a double-edged sword. In other words, they may be easy to get going and wrap your head around, but they have serious performance issues, security issues, scale issues, and “experiential cliffs” to borrow a turn of phrase from the Kubernetes founders when they commented on the sometimes over-simplicity of many PaaS systems (iow. they hit a use case where the system just can’t do that experience/feature that is needed).

Back to DockerCon Seattle…

Let’s put aside the SDN startups that to various extents suffer from the over-simplicity or lack of soak and development time, leading to the issues above. The thing that really grinds my gears about last year’s DockerCon can be boiled down to Docker, a powerful voice in the community, really advocating that container networking was making serious strides, when at the same time they were using the most primitive of statements (and solution) possible, introducing “Multi-host networking”

You may recall my social post/poke at the photo of this slide with my sarcastic caption.

Of course, Docker was talking about their overlay-based approach to networking that was launched as the (then) new default mode to enable networking in Swarm clusters. The problem is that most of the community are not SDN experts, and so they really don’t know any better than to believe this is an aww!-worthy contribution. A few of us that have long-worked in networking were less impressed.

Because of the attention that container projects get, Docker being the biggest, these kind of SDN solutions are still seen today by the wider community of users as good networking solutions to go with because they easily work in the very basic CaaS use cases that most users start playing with. Just because they work for your cluster today, however, doesn’t make them a solid choice. In the future your netops team will ask about X, Y and Z (and yet more stuff down the road they won’t have the foresight to see today). Also in the future you’ll expand and mature your use cases and start to care about non-functional traits of the network which often happens too late in production or when problems arise. I totally get it. Networking isn’t the first thing you want to think about in the cool new world of container stacks. It’s down in the weeds. It’s more exciting to contemplate the orchestration layer, and things we understand like our applications.

On top of the fact that many of these new SDN players offer primitive solutions with hidden pitfalls down the road that you won’t see until it’s too late, another less pardonable nuisance is the fact that most of them are perpetrating the myth that container networking is somehow special. I’ve heard this a lot in various verbiage over the ~7 years that SDN has arisen for cloud use cases. Just this week, I read a meetup description that started, “Containers require a new approach to networking.” Because of all the commotion in the container community with plenty of new SDN projects and companies having popped up, you may be duped into believing that, but it’s completely false. These players have a vested interest, though, in making you see it that way.

The truth about networking containers

The truth is that while workload connectivity to the network may change with containers (see CNM or CNI) or with the next new thing, the network itself doesn’t need to change to address the new endpoint type. Where networks did need some work, however, is on the side of plugging into the orchestration systems. This meant that networks needed better programmability and then integration to connect-up workloads in lock-step with how the orchestration system created, deleted and moved workloads. This meant plugging into systems like vSphere, OpenStack, Kubernetes, etc. In dealing with that challenge, there were again two mindsets to making the network more programmable, automated, and agile: one camp created totally net-new solutions with entirely new protocols (OpenFlow, STT, VxLAN, VPP, etc.), and the other camp used existing protocols to build new more dynamic solutions that met the new needs.

Today the first camp solutions are falling by the wayside, and the camp that built based on existing open standards and with interoperability in mind is clearly winning. OpenContrail is the most successful of these solutions.

The truth about networks is that they are pervasive and they connect everything. Interoperability is key. 1) Interoperability across networks: If you build a network that is an island of connectivity, it can’t be successful. If you build a network that requires special/new gateways, then it doesn’t connect quickly and easily to other networks using existing standards, and it won’t be successful. 2) Interoperability across endpoints connections: If you build a network that is brilliant at connecting only containers, even if it’s interoperable with other networks, then you’ve still created an island. It’s an island of operational context because the ops team needs a different solution for connecting bare-metal nodes and virtual machines. 3) Interoperability across infrastructure: If you have an SDN solution that requires a lot from the underlay/underlying infrastructure, it’s a failure. We’ve seen this with SDNs like NSX that required multicast in the underlay. We’ve seen this with ACI that requires Cisco switches to work. We’ve even seen great SDN solutions in the public cloud, but they’re specific to AWS or GCP. If your SDN solution isn’t portable anywhere, certainly to most places, then it’s still doomed.

If you want one unified network, you need one SDN solution

This aspect of interoperability and portability actually applies to many IT tools if you’re really going to realize a hybrid cloud and streamline ops, but perhaps nowhere is it more important than in the network because of its inherently pervasive nature.

If you’re at DockerCon this week, you’ll be happy to know that the best solution for container networking, OpenContrail, is also the best SDN for Kubernetes, Mesos, OpenStack, NFV, bare-metal node network automation, and VMware. While this is one SDN to rule and connect them all, and very feature rich in its 4th year of development, it’s also never been more approachable, both commercially turn-key and in open source. You can deploy it on top of any public cloud or atop of private clouds with OpenStack or VMware, or equally easily on bare-metal CaaS, especially with Kubernetes, thanks to Helm.

Please drop by and ask for an OpenContrail demo and sticker! for your laptop or phone at the Juniper Networks booth, and booths of partners of Juniper’s that have Juniper Contrail Networking integrations: Red Hat, Mirantis, Canonical, and we’re now happy to welcome Platform9 to the party too. We at Juniper will be showcasing a joint demo with Platform9 that you can read more about on the Platform9 blog.

PS. If you’re running your CaaS atop of OpenStack, then even more reason that you’ll want to stop by and get a sneak peak of what you’ll also hear more about at the upcoming Red Hat and OpenStack Summits in Boston.

The Best SDN for OpenStack, now for Kubernetes / March 27, 2017 by James Kelly

In between two weeks of events as I write this, I got to stop at home and in the office for only a day, but I was excited by what I saw on Friday afternoon… The best SDN and network automation solution out there, OpenContrail, is now set to rule the Kubernetes stage too. With title of “#1 SDN for OpenStack” secured for the last two years, the OpenContrail community is now ready for Kubernetes (K8s), the hottest stack out there, and a passion of mine for which I’ve been advocating for more and more of attention around the office at Juniper.

It couldn’t be better timing, heading into KubeCon / CloudNativeCon / OpenShift Commons in Berlin this week. For those that know me, you know I love the (b)leading edge of cloud and devops, and last year when I built an AWS cluster for OpenContrail + OpenShift / Kubernetes then shared it in my Getting to #GIFEE blog, github repo, and demo video, those were early days for OpenContrail in the K8s space. It was work pioneered by OpenContrail co-founder and a few star engineers to help me cut through the off-piste mank and uncharted territory. It was also inspired by my friends at tcp cloud (now Mirantis) who presented on Smart cities / IoT with K8s, OpenContrail, and Raspberry Pi at the last KubeCon EU. Today, OpenContrail, the all-capability open SDN is ready to kill it in the Kubernetes and OpenShift spaces, elevating it to precisely what I recently demo shopped for in skis: a true all-mountain top performer (BTW my vote goes to the Rosi Exp. HD 88). Now that you know what I was doing when not blogging recently… Skiing aside, our demos are ready for the KubeCon audience this week at our Juniper booth, and I’ll be there talking about bringing a ton of extra value to the Kubernetes v1.6 networking stack.

What sets OpenContrail apart from the other networking options for Kubernetes and OpenShift is that it brings an arsenal of networking and security features to bear on your stack, developed by years of the performance-obsessed folks at Juniper and many other engineers in the community. In this space OpenContrail often competes with smaller SDN solutions offered by startups. Their said advantage is positioned as nimbler and simpler solutions. This was particularly true in their easy, integrated installation with K8s (coming for OpenContrail very soon), but on the whole, their advantage, in general across the likes of Contiv, Calico, Flannel, Weave, etc. boils down to 2 things. Let’s have a look…

First, it is easy to be simpler when primitive. This isn’t a knock on the little guys. They are very good for some pointed use cases and indeed they are simple to use. They do have real limits however. Simple operation can always come from less knobs and fewer capabilities, but I believe an important goal of SDN, self-driving infrastructure, cognitive / AI in tech, is abstraction; in other words, simplicity needn’t come from striping away functionality. We only need start with a good model for excellent performance at scale and design for elegance and ease of use. It’s not easy to do, but have you noticed that Kubernetes itself is the perfect example of this? – super easy 2 years ago, still is, but at the same time there’s a lot more to it today. It’s built on solid concepts and architecture, from a lot of wisdom at Google. Abstraction is the new black. It’s open, layered and transparent when you need to peel it back, and it is the solution to manage complexity that arises from features and concepts that aren’t for everyone. General arguments aside, if you look at something like K8s ingress or services networking / balancing, none of the SDN solutions cover that except for very pointed solutions like GKE or nginx where you’d still need one or more smaller SDN tools. Furthermore, when you start to demand network performance on the control (protocol/API) and data (traffic feeds & speeds) planes with scale in many other K8s metrics, the benefits of OpenContrail that you get for free really stand apart from the defacto K8s networking components and these niche offerings. Better still, you get it all as one OpenContrail solution that is portable across public and private clouds.

Second, other solutions are often solely focused on K8s or CaaS stacks. OpenContrail is an SDN solution that isn’t just for Kubernetes nor containers. It’s one ring to rule them all. It works on adjacent CaaS stack integrations with OpenShift, Mesos, etc., but it’s equally if not more time-tested in VMware and OpenStack stacks and even bare-metal or DIY orchestration of mixed runtimes. You might have heard of publicized customer cases for CaaS, VMware and OpenStack across hybrid clouds and continuing with the same OpenContrail networking irrespective of stacking PaaS and CaaS upon pub/prvt IaaS or metal (here are a few 1, 2, 3). It’s one solution unifying your stack integration, network automation and policy needs across everything. In other words, the only-one-you-need all-mountain ski. The only competitor that comes close to this breadth is Nuage Networks, but they get knocked out quickly for not being open source (without tackling other important performance and scale detail here). If the right ski for the conditions sounds more fun to you, like the right tool for the right job, then wait for my next blog on hybrid cloud…and what I spoke about at IBM Interconnect (sorry no recording). In short, while having lots of skis may have an appeal to different conditions, having lots of networking solutions is a recipe for IT disaster because the network foundation is so pervasive and needs to connect everything.

With OpenContrail developers now making it is dead easy to deploy with K8s (think kubeadm, Helm, Ansible…) and seamless to operate with K8s, it’s going to do what it did for OpenStack networking, for Kubernetes networking very quickly. I’m really excited to be a part of it.

You can visit the Juniper booth to find me and hear more. As you may know, Juniper is the leading contributor to OpenContrail, but we’re also putting K8s to work in the Juniper customer service workbench and CSO product (NFV/CPE offering). Moreover, and extremely complementary to Juniper’s Contrail Networking (our OpenContrail offering), Juniper just acquired AppFormix 4 months ago. AppFormix is smart operations management software that will make your OpenStack or K8s cluster ops akin to a self-driving Tesla (a must if your ops experience is about as much fun as a traffic jam). Juniper’s AppFormix will give you visibility into the state of workloads both apps and software-defined infrastructure stack components alike, and it applies big data and cognitive machine learning to automate and adapt policies, alarms, chargebacks and more. AppFormix is my new jam (with Contrail PB)… it gathers and turns your cloud ops data into insight… The better and faster you can do that, the better your competitive advantage. As you can tell, KubeCon is going to be fun this week!

DEMOS DEMOS DEMOS

+update 3/28
Thanks to the Juniper marketing folks for a list of the demos that we'll be sharing at KubeCon:

Contrail Networking integrated with a Kubernetes cluster (fully v1.6 compatible) -
- Namespaces/RBAC/Network Policy improved security with OpenContrail virtual networks, VPC/projects and security groups (YouTube demo video)
- Ingress and Services load balancing with OpenContrail HAProxy, vRouter ECMP, and floating IP addresses, improving performance (remove KubeProxy) and consolidating SDN implementation (YouTube demo video)
- Connecting multiple hybrid clouds with simple network federation and unified policy
- Consolidating stacked CaaS on IaaS SDN: optimized OpenContrail vRouter networking "passthru" for Kubernetes nodes on top of OpenStack VMs
Contrail Networking integrated with an OpenShift cluster (YouTube demo video)
Contrail Networking containerized and easy deployment in Kubernetes
AppFormix monitoring and analytics for Kubernetes and workload re-placement automation

I'll light these up with links to YouTube videos as I get the URLs.