Serverless Can Smarten Up Your DevOps & Infrastructure Stack by James Kelly

As IT organizations and cultures reshape themselves for speed, we’ve seen a trend away from IT as a service provider, to IT as:

  • a collection of smaller micro-service provider teams,
  • each operating and measuring like autonomous (micro-)product teams,
  • each building and running technology systems as micro-services.

Now you’ve heard me cover the intersection of microservices and containers quite a bit, along with CaaS/Kubernetes or PaaS/OpenShift orchestration. In this blog, I’m going to change it up a bit and talk about the other kind of microservices building block: code functions.

Here I’m talking about microservices run atop of a serverless computing a.k.a. function-as-a-service (FaaS) stack. Serverless is a 2-years-new yet already well-covered topic. If you need a primer this one is the best I’ve found.

In my last blog, I examined avoiding cloud lock-in by bringing your own devops and software-defined infrastructure stack: bringing your own open-source/standard toolchain across public/private IaaS, you maintain portability and harmonized, if not unified, hybrid cloud policy. This software-defined infrastructure stack, underpinning the user- or business-facing applications, is also still just a bunch of software applications. FaaS is just one example of an increasingly popular app-developer technology that can be employed by infrastructure developers too, though this one hasn’t gotten much attention yet in that regard.

If you read the primer above, you won’t have learned that the popularity of FaaS systems like Lambda to reduce developer friction has led to a number of open source projects like OpenWhisk, and others that to sit atop of a CaaS/PaaS (such as Fission or Kubeless). If a FaaS is likely to exist in your application stack, it may well make sense to standardize on incorporating one into your devops and infrastructure stack. Let’s see.

I&O EVENTS

When it comes to I&O events, does your mind go to a pager waking somebody up in the middle of the night? Better to wake up a function and keep sleeping.

The main area of natural applicability for FaaS is in event processing. Such code functions after all are event handlers.

In the devops world, for development and application lifecycle events, FaaS presents a simple way to further automate. The scale is probably low for most CI/CD events, so the main benefit here of employing FaaS is the simplicity and agility in your pipelines as code. It also helps foster the change and process innovation to do pipelines as code in the first place. In the devops area of continuous response (CR), that Randy Bias coins here, there is almost certainly more scale and importance to respond to events, where the property of FaaS to handle on-demand bursty scale could also be a significant benefit.

In the software-defined infrastructure world, there are traditional events such as change logs, alarms, security alerts, and modern telemetry and analytics events. Of all the types, analytics, telemetry and security events strike me as the area where FaaS could present the biggest benefit and most interesting examination here. Similar to how for CR, analytics for an application (processing telemetry such as API calls, clicks, and other user events) needs scale, software-defined infrastructure is increasingly generating massive amounts of telemetry data that needs processing.

Systems like Juniper Networks AppFormix and Contrail Networking already do some telemetry collection and processing (see for yourself). AppFormix, a platform for smart operations management, in fact, processes much of the telemetry with light machine learning on the edge node on which it’s generated, so that sending the big data for centralized heavy-duty crunching and machine learning in the centralized control plane is more efficient. We at Juniper have written before about this distributed processing model and its benefits such as realer real-time visibility monitoring to show you what’s happening because… the devops call, “You build it, you run it,” sounds great until you’re running in a multi-tenant/team/app environment and chaos ensues. AppFormix also has also a smart policy engine, that is using static thresholds as well as anomaly detection to generate... you guessed it... events!

Just like devops event hooks can trigger a FaaS function, so can these software-defined infrastructure events.

Obviously a system like AppFormix implements much of the CR machine-learning smart glue for the analytics of your stack and its workloads; smart-driving ops as I call it. Other systems like Contrail Networking with its analytics Kafka or API interface are rawer. The same is true of other SD-infrastructure components like raw Kubernetes watcher events with which some FaaS systems like Fission already integrate.

The more raw the events, I bet, the more frequent they are too, so you have to decide at what point the frequency becomes so high that it’s more efficient to run a full-time process instead of functions on demand. That depends on the scale with which you want to run your own FaaS tool, or use a public cloud FaaS service (of course that only makes sense if you’re running the stack on that public cloud, and want to tradeoff some lock-in until they mature as usable in a standard way…worth seeing Serverless Framework here).

TO WHAT END?

One of the ends that you could have in mind to run operational-based functions is greater operational intelligence. As we’ve discussed, AppFormix does this to a good extent, but it does it within a cluster. We can consider further smart automation use cases in-cluster, but using a FaaS external to your clusters, your infrastructure developers could scope your intelligence deeper or broader too.

For example, you could go deeper, processing telemetry in customized ways that AppFormix does not today. Imagine you’re running on GCP. Well you could do deep learning on telemetry data, even with TPUs I might add. To borrow a metaphor from a common ML example of image recognition, where AppFormix might be able to tell you, “I see a bird,” a powerful deep learning engine with TPU speed up, may, in real-enough time, be able to tell you, “I see a peregrine falcon.” In more practical terms, you may be able to go from detecting anomalies or security alerts to operations and security situational awareness where your systems can run themselves better than humans. After all, they already can at lower levels thanks to systems like Kubernetes, Marathon and AppFormix.

Rather than deeper, broader examples could be use cases that extend beyond your cluster. For example, auto-scaling or auto-healing of the cluster by integrating with physical DCIM systems or underlying IaaS. Yes, it’s serverless managing your servers folks!

In closing, there is plenty more to be written in the future of serverless. I expect we’ll see something soon in the CNCF, and then we’ll definitely start hearing of more application use cases. But hopefully, this blog sparked an idea or two about its usefulness beyond the frequent http-request and API-callback use cases for high-level apps. (Please share your ideas). For intent-driven, self-driving, smart-driving infrastructure, I think a FaaS track is emerging.

How to Avoid the Cloud Trap by James Kelly

trap.jpg

Do you ever want something so badly that you might just fall into a trap trying to get it? That cheese looks awfully yum…whack! Oops :(

When speed = haste 

When speed = haste we blind ourselves to potential pitfalls. One area today of much haste in many enterprise IT organizations is, you guessed it, the move to the cloud. I was visiting a customer today, and I was told a familiar story: Their application team has built a new application proof-of-concept on AWS, and the line of business is going to fund this app to move ahead. The familiar part of the story line is that this was all achieved in X weeks instead of X months. With impressive results, developers are soon pulling the rest of IT to do cloud with an executive push.

Embracing cloud native is a great thing, but doing so with this haste is running right into the palm of AWS’s hand. Not to pick on AWS, this can happen anywhere, but it’s most likely to happen at AWS because they’re the incumbent choice, and there are so many services to help developers deliver something cloud-native quickly.

To developers, these AWS services look like candy to Garfield on halloween. Unsuspectingly, many will never see the trap. Why? Because the lock-in police within the IT organization are patrolling I&O, while the lock-in felons have cleverly gone to work over on the application developers.

The new ‘lock-in’ is from developers,
not infrastructure and operations

I often talk about the move to cloud being led by developers and the devops trend; however, I hadn’t put 2 & 2 together quite like I did today with respect to lock-in. Although, I do often preach about how 80-odd percent of enterprises are targeting the hybrid cloud model, and how to make hybrid cloud a true IT platform, there must be portability across clouds, I guess I didn’t see the inverse of that concern.

That brings us to discussing the solution. How to achieve portability and a hybrid cloud IT platform with, less but not absolutely no, lock-in.

Let’s just examine the main public cloud business models for their services.

  1. First, nearly everyone is very familiar with IaaS. It’s offered at almost all big clouds, certainly the main 3.
  2. Then, there is the managed service model, where the cloud provider will usually take the many popular open-source software projects and offer them as individual managed services.
  3. Finally, there are the services that are homegrown or customized by the provider and offered as a service.

This is a decent summary of the main service models, but certainly not the only things that cloud providers can offer. Some other interesting examples include: AWS offers Direct Connect; GCP differentiates with its own high-speed global network to interconnect regions; Azure has the broadest compliance coverage. Anyway, if you understand the 3 models above, you can probably easily grasp what’s coming next… That developers that use customized, homegrown services are clearly locking themselves in the most.

Getting conscious about lock-in

So should you never use these customized cloud services? Of course you should if it is really worth it, but do it consciously. Obviously, services that are unique prevent apps that touch them from being very portable across your hybrid cloud platform.

The good news is that most cloud services today, have an open source project that matches most of what any service can do. If not, there is still a good chance that you can achieve similar benefit with a combination of open source projects, or vendor products based on such projects.

Bring your own stack

Perhaps the main thing that enables IT to go from multi cloud to consciously building a universal hybrid cloud platform is a unified toolchain and unified policy. Some systems’ policy unification can be achieved with meta-orchestrators that work across multiple clouds like Red Hat CloudForms and RightScale, but they have their limitations. Using the many benefits of public and private cloud IaaS, you can still control the application and devops stack to achieve portability and mitigate lock-in by bringing your own full stack that sits atop of any IaaS.

Anyone can do multi cloud. Throw in a little bit of this, and a little bit of that. But the recipe for a happier hybrid cloud seems to be known to some organizations, but not to all, so let’s have a look at how to do cloud with less lock-in and more portability.

Embrace cloud IaaS as a base for your devops stack, but use other cloud services sparingly:

  • Bring your own IaaS automation and abstraction (such as Terraform and config management tools like Puppet, Chef, Ansible, etc.)
  • Lock in to cloud services consciously when they are unique and necessary for business advantage
  • For services that have open source tool equivalents, bring your own tool or at least use a managed service that has the generic API

Start with 2 clouds instead of one. This will…

  • Prevent you from tethering yourself to just one partner for cloud innovation & economics
  • Force the application cluster / stack to be portable
  • Force the DevOps workflows to be portable
  • Force designing for resiliency and scale early on

Take the naïve out of cloud native

If your business case for cloud is that your developers’ new-found speed is knocking the socks off of your executives, then maybe have a closer look.

There is a better way to “do cloud” with portable apps, automation, and software-defined infrastructure atop of any IaaS, and even better, there is actually plenty of help. Check out the cloud-native computing foundation that Juniper and many other vendors recently joined and for your unified toolchain check out some of Juniper’s cloud software portfolio that can fit equally well across your public/private/ hybrid cloud venues such as AppFormix, vSRX, vMX and Contrail Networking built from OpenContrail.

 

Juniper & Red Hat Serve Up an Open Double-Stack Cloud with an SDN Twist by James Kelly

Photo credit Tom Haverford

Photo credit Tom Haverford

Juniper Networks Contrail Networking, developed in the OpenContrail open source project, has long been a part of Red Hat’s millinery. The partnership between Juniper and Red Hat goes back some years now. Collaborating on OpenStack cloud and NFV infrastructure has won these partners success in supporting large enterprises and communications service providers like Orange Business Services.

At the long list of open source festivities in Boston over the next 2 weeks, you will hear these partners in cloud building on their past successful OpenStack + Contrail integration and now putting the spotlight on new integrations to support cloud native. You’ve heard me blog about the OpenContrail integration with OpenShift back a year already (in its first alpha form that I demoed), and more recently for CloudNativeCon and DockerCon talking about how we evolved that work to make this integration enterprise-ready and up-to-date with all the innovation that’s happened in the fast-paced OpenShift releases.

But how do you get the best of OpenStack and OpenShift?

Red Hat has been helping customers to move faster with devops, continuous delivery, and containers using OpenShift for a long time. Naturally Red Hat often does this atop of Red Hat OpenStack Platform, where OpenStack creates clusters of virtual machine hosts for the Red Hat OpenShift Container Platform cluster.

One latent hitch in this double stack is the software-defined networking (SDN). Like OpenStack, OpenShift and Kubernetes (on which OpenShift is based) have their opportunities for improvement. One area that is frequently fortified and improved is the software-defined networking (SDN), and the importance of doing this doubles when you’re running the double stack of OpenShift on top of OpenStack.

Why can SDN be such a snag? Well, the network is a critical part of any cloud, and especially cloud-native, infrastructure because of the enormous volume of microservice-generated east-west traffic, along with load balancing, multi-tenancy, and security requirements; that’s just for starters. The good-enough SDN that is included but swappable in such open source cloud stacks is very often indeed good enough for small cloud setups, but it is common to see the SDN replaced with something more robust for clouds with more than 100 nodes or other advanced use cases like NFV.

Fast and Furious Clouds

I think of this 100-node edge like the 100mph edge of a car… As I make my way to Boston, I just took an Uber to SFO, and of course it was a Prius. Like most cars nowadays they’re very efficient, and great for A-to-B commuting. But also like most cars, when you approach or (hopefully don’t) pass 100mph mark, the thing feels like it is going to disintegrate! I was a little uneasy today flying down the 101 at just 80-something mph. Eeek!

Now I love to drive, and drive fast, but I don’t do it in a bloody Prius! My speed-seeking readers will probably know, as I do, that if you go fast in a sportscar, it is a way better experience. It’s smooth at high speeds, and the power feels awesome. Now let’s say you also aren’t just driving in a straight line, but you’re on the Laguna Seca Raceway. To handle cornering agility, gas and maintenance pit stops, and defense/offense versus the other drivers, you need more than just smooth handling. The requirements add up. I think you get my drift…

My point is that, similarly, if you’re going to build a cloud, you need to consider that you’re going to need a lot more than good enough. Good enough might do you fine for some dev/test or basic scenarios, but if you need performance, elastic scale, resilience, F1-pit-crew speed and ease of maintenance, and a security defense/offense, then you need to invest in building the best. Juniper has been helping its customers build the best networks for over 20 years. It’s what Juniper is known for: high-performance and innovation. When you (or say NSS) compare security solutions, again, Juniper is on top for performance, effectivity (stopping the bad stuff) and value I might add.

When Compounding Good-enough Isn’t Great

Imagine the challenges you can run into with good-enough networking… now imagine you stack two such solutions on top of each other. That’s what happens with OpenShift or Kubernetes on top of OpenStack.

In this kind of scenario, compounding the two stacks’ SDN, as you demand more from your cloud, you will double complexity and twice as quickly hit network disintegration!

Ludacris Mode for Your Cloud

If you want to drive your cloud fast and furious, and not crash, you need some racing readiness. OpenContrail is designed for this, and proven in some of the largest most-demanding clouds. No need to recount the awesomeness here. It’s already well documented. Before you green-light it though, there is one thing we’ve needed to iron out: How does an OpenContrail double stack drive?

SDN Inception

When OpenContrail developers were in their first throws of SDN integration for Kubernetes and OpenShift, we often ran it inside of an OpenStack cloud. And what was the OpenStack SDN? OpenContrail of course. Yep, that’s right we have OpenContrail providing an IP overlay on top of the physical network for the OpenStack VM connectivity, and then inside of that overlay and those VMs we installed OpenShift (or Kubernetes) with another OpenContrail overlay. It turns out this SDN inception works just fine. There’s nothing special to it. OpenContrail just requires an IP network, and the OpenStack-level OpenContrail fits the bill perfectly.

In fact, SDN inception is pretty common, but not usually with the same SDN at both levels. The main place this happens in practice is because we run cloud-native CaaS/PaaS stacks like Kubernetes, Mesos, OpenShift paired with OpenContrail on top of public clouds, and that public IaaS line AWS has its own underlying SDN. It provides the IP underlay that we need in those cases.

What about when we control the SDN at the IaaS AND the CaaS/PaaS layers? Even if 2 SDNs (the same or different solution) work well stacked atop each other, it’s not ideal because there is still double the complexity of managing them. If only there was a better way…

A New Hat Stack Trick

This is where the OpenContrail community was inspired to raise the bar, and the Red Hat stack of OpenShift on OpenStack is the perfect motivation. What’s now possible today is to unwind the SDN inception and use one single control and data plane for OpenShift or Kubernetes on top of OpenStack when you run OpenContrail. The way this is realized is by having the OpenStack layer work as usual, and using OpenContrail in a different way with OpenShift or Kubernetes. In that instance, the OpenContrail plugin for OpenShift/Kubernetes master will speak directly to the OpenContrail controller used at the OpenStack layer. To collapse the data plane, we have a CNI plugin passthru that will not require the OpenContrail vRouter to sit inside the host VM for each OpenShift/Kubernetes minion (compute) node. Instead the traffic will be channeled from the container to the underlying vRouter that is sitting on the OpenStack nova compute node. We’ll save further technicalities and performance boost analysis for an OpenContrail engineering blog another day.

Juniper and Red Hat work on this latest innovation of flattening the SDN stack is coming to fruition. It is available today in the OpenContrail community or Juniper Contrail Networking beta, and slated for Juniper’s next Contrail release. As to that, stay tuned. As to catching this in action, visit Juniper and Red Hat at the Red Hat Summit this week and the OpenStack Summit next week. We’ll see you there, and I hope you hear about this and more OpenContrail community innovations ahead and in deployment at the OCUG next week.