When you’re off to the races with Kubernetes, the first order of business as a developer is figuring out a micro-services architecture and your DevOps pipeline to build pods. However, if you are the Kubernetes cluster I&O pro, also known as site reliability engineer (SRE), then your first order of business is figuring out Kubernetes itself, as the cluster becomes the computer for pod-packaged applications. One of the things a cluster SRE deals with—even for managed Kubernetes offerings like GKE—is the cluster infrastructure: servers, VMs or IaaS. These servers are known as the Kubernetes nodes.
The Pursuit of Efficiency
When you get into higher-order Kubernetes there are a few things you chase.
First, multi-purpose clusters can squeeze out more efficiency of the underlying server resources and your SRE time. By multi-purpose cluster, I mean running a multitude of applications, projects, teams/tenants, and devops pipeline stages (dev/test, build/bake, staging, production), all on the same cluster.
When you’re new to Kubernetes, such dimensions are often created on separate clusters, per project, per team, etc. As your K8s journey matures though, there is only so long you can ignore the waste this causes in the underlying server-resource capacity. Across your multicloud, consolidating many clusters into as few as practical for your reliability constraints also saves you time and less swivel-chairing for: patching, cluster upgrades, secrets and artifact distribution, compliance, monitoring, and more.
Second, there’s the constant chase of scaling efficiency. Kubernetes and active monitoring agents can help take care of auto-scaling individual micro-services, but scaling out assumes you have capacity in your cluster nodes. Especially if you’re running your cluster atop IaaS, it’s actually wasteful to maintain—and pay for—extra capacity in spare VM instances. You probably need some buffer because spinning up VMs is much slower than for containers and pods. Dynamically right-sizing your cluster is quite the predicament, particularly as it becomes more multi-purpose.
True CaaS: Serverless Containers
When it comes to right-sizing your cluster scale, while the cloud providers are happy to take your money for extra VMs powering your spare node capacity, they do have a better solution. At Re:Invent 2017, AWS announced Fargate to abstract away the servers underneath your cluster. Eventually it should support EKS in addition to ECS. In the meantime, Azure Container Instances (ACI) is a true Kubernetes-pods as a service offering that frees you from worrying about the server group upon which it’s running.
Higher-order Kubernetes SRE
While at Networking Field Day 17 (NFD17 video recording), I presented on “shifting left” your networking and security considerations to deal with DevOps and multi-purpose clusters. It turns out that on the same day, Software Engineering Daily released their Serverless Containers podcast. In listening to it you’ll realize that such serverless container stacks are probably the epitome of multi-purpose Kubernetes clusters.
What cloud providers offer in terms of separation of concerns with serverless container stacks, great cluster SREs will also aim to provide to the developers they support.
When you get to this level of maturity in Kubernetes operations, you’re thinking about a lot of things that you may not have originally considered. This happens in many areas, but certainly in networking and security. Hence me talking about “shift left,” so you can prepare to meet certain challenges that you otherwise wouldn’t see if you’re just getting Kubernetes up and running (great book by that name).
In the domain of open networking and security, there is no project that approaches the scalability and maturity of OpenContrail. You may have heard of the immortal moment, at least in the community, when AT&T chose it to run their 100+ clouds, some of enormous size. Riot Games has also blogged about how it underpins their DevOps and container runtime environments for League of Legends, one of the hugest online games around.
Cloud-grade Networking and Security
For cluster multi-tenancy, it goes without saying that it’s useful to have multi-tenant networking and security like OpenContrail provides. You can hack together isolation boundaries with access policies in simpler SDN systems (indeed, today, more popular due to their simplicity), but actually having multi-tenant domain and project isolation in your SDN system is far more elegant, scalable and sane. It’s a cleaner hierarchy to contain virtual network designs, IP address management, network policy and stateful security policy.
The other topic I covered at NFD17 is the goal of making networking and security more invisible to the cluster SRE and certainly to the developer, but providing plenty of control and visibility to the security and network reliability engineers (NREs) or NetOps/SecOps pros. OpenContrail helps here in two crucial ways.
First, virtual network overlays are a first-class concept and object. This is very useful for your DevOps pipeline because you can create exactly the same networking and security environment for your staging and production deployments (here’s how Riot does it). Landmines lurk when staging and production aren’t really the same, but with OpenContrail you can easily have exactly the same IP subnets, addresses, networking and security policies. This is impossible and impractical to do without overlays. You may also perceive that overlays are themselves a healthy separation of concerns from the underlay transport network. That’s true, and they easily enable you to use OpenContrail across the multicloud on any infrastructure. You can even nest OpenContrail inside of lower-layer OpenContrail overlays, although for OpenStack underlays, it provides ways to collapse such layers too.
Second, OpenContrail can secure applications on Kubernetes with better invisibility to your developers—and transparency to SecOps. Today, a CNI provider for Kubernetes implements pod connectivity and usually NetworkPolicy objects. OpenContrail does this too, and much more that other CNI providers cannot. But do you really want to require your developers to write Kubernetes NetworkPolicy objects to blacklist-whitelist the inter-micro-service access across application tiers, DevOps stages, namespaces, etc? I’d love to say security is shifting left into developers’ minds, and that they’ll get this right, but realistically when they have to write code, tests, fixes, documentation and more, why not take this off their plates? With OpenContrail you can easily implement security policies that are outside of Kubernetes and outside of the developers’ purview. I think that’s a good idea for the sanity of developers, but also to solve growing security complexity in multi-purpose clusters.
If you’ve made it this far, I hope you won’t be willfully blind to the Kubernetes SRE-fu you’ll need sooner or later. Definitely give OpenContrail a try for your K8s networking-security needs. The community has recently made it much more accessible to quick-start with Helm packaging, and the work continues to make day-1 as easy as possible. The Slack team is also helpful. The good news is that with the OpenContrail project very battle tested and going on 5 years old, your day-N should be smooth and steady.
PS. OpenContrail will soon be joining Linux Foundation Networking, and likely renamed, making this article a vestige of early SDN and cloud-native antiquity.
image credit alexandersonscc / Pixabay