kubernetes

Kubernetes: The Modern-day Kernel / November 15, 2019 by James Kelly

In the lead up to KubeCon + CloudNativeCon North America 2019, we are posting a series of blogs covering what we expect to be some of the most prevalent topics at the event. In our first post, we walked through the journey from the monolithic to the microservices-based application, outlining the benefits of speed and scale that microservices bring, with Kubernetes as the orchestrator of it all.

Kubernetes may appear to be the new software-defined, a panacea like software-defined networking (SDN), famously personified at Juniper by JKitty—the rainbow-butterfly-unicorn kitten. But you know what they say about the butterfly effect. When the Kubernetes kitty flaps its wings…there’s a storm coming. Kubernetes is indeed amazing—but not yet amazingly easy. Along with a shift to a microservices architecture, Kubernetes may create as many challenges as it solves.

Breaking applications into microservices means security and networking are critical because the network is how microservices communicate with each other to integrate into a larger application. Get it wrong, and it’s a storm indeed.

This week at KubeCon, Juniper is showcasing solutions in the areas where we’re best known for engineering prowess: at-scale networking and security.

As luck would have it, these coincide perfectly to simplify the challenges that Kubernetes creates. Let’s look at some a little closer.

The Opportunities and Challenges of Microservices and Kubernetes

A well-architected microservices-based application could shrug off the loss of a single container or server node, as long as there was an orchestration platform in place to ensure enough instances of the right services were active to allow the application to meet demand. Microservices-based applications are, after all, designed to add and remove individual service instances on an as-needed basis.

This sort of scale-out, fault-tolerant orchestration is what Kubernetes does with many additional constraint-based scheduling features. For this reason, Kubernetes is often called the operating system kernel for the cloud. In many ways, it’s a powerful process scheduler for a cluster of distributed microservices-based applications. But a kernel isn't everything that an application needs to function.

In true Juniper form of solving the hardest problems in our industry, we are engineering simplicity — tackling challenges in storage, security, networking and monitoring. We know from our customers and own experience that if you’re managing your own Kubernetes deployments, these challenges need to be squarely addressed in order to successfully manage this new IT platform.

One way to tackle problems is to outsource them. Kubernetes as a service is one approach to simplicity, but it also comes with trade-offs in cost and multicloud portability and uniformity. Therefore, operating Kubernetes clusters will be in the cards for most enterprises—and Juniper is here to help.

Don’t Limit Your Challenges, Challenge Your Limits

Kubernetes operators often deal with its challenges by limiting the size of clusters.

Smaller cluster sizes will restrict the security and reliability blast radius and such pigmy-scaled demands on monitoring, storage and networking look easier to solve than one larger shared cluster. In some cases, each application team will deploy its own cluster. In other cases, each development lifecycle phase—dev, test, staging and production—has its own Kubernetes cluster. All variants aside, many Kubernetes operators are deploying small clusters; for example, 10-20 nodes and a few applications in each one. Yet, Kubernetes can scale a couple orders of magnitude beyond this.

The result of many small clusters already has a name: Kube sprawl.

While there may be benefits of small cluster design, this approach quickly introduces new challenges, not containing complexity but merely shifting it around. The operational challenges of managing many clusters also means the added juggling of more Kubernetes versions in flight, upgrades and patching. Not to mention, more engineers to do all of this or the added task of building your own automation to do so.

Moreover, there is the obvious drawback that since each server or VM node can only belong to one Kubernetes cluster, efficiencies are lost that a larger shared cluster would afford. Resource efficiencies and economies of scale come when there is a great variety of applications, teams, batches and peak usage times.

If a small cluster is just running a few apps, it’s unlikely that the apps will be diverse enough to steadily use its whole pool of resources, so there will be times of waste. Operators can try to make the number of cluster nodes itself elastic like the containerized applications, but this orchestration is difficult to automate and build per the unique demands of applications inside each small cluster. Given each Kubernetes cluster needs at least three server nodes for high availability, that also is replicated waste across the number of clusters maintained.

Many clusters create new challenges for developers. While cloud-native tools, such as microservices tracing, exist to benefit developers, these and other middleware services are generally designed to work within, not across, clusters. Other new tools like service meshes can be more complex when federated across clusters.

Applications Span Edge Clouds and Multicloud, So Too Must Cloud-native Infrastructure

Kubernetes is good at managing a single cluster of servers and the containers that run on them. But some applications will run in multiple clusters to span multiple fault domains or availability zones (AZ). This isn’t multiple small clusters inside the same data center AZ, but rather about spanning data centers for additional availability, better global coverage and user latency.

Solving security, networking, monitoring and storage per cluster is a first step, but strong solutions should deal with the challenges of federating multiple clusters across AZs, regions, edge cloud and multicloud. Here again, Juniper has some market leadership to show off at KubeCon.

Security and networking become more complex in this scenario, as they play a more important role in global application architecture. The network must link together microservices, as well as multicloud. Defense in depth must protect one cluster, but defense in breadth is another requirement needing to equally enforce policy end-to-end in addition to top-to-bottom.

You Can’t Always Leave the Legacy Behind

In the real world of enterprise, the services used by cloud-native applications will range from "that bit still runs on a mainframe" to "this bit can be properly called a microservice." So how can organizations provide a secure, performant and scalable software-defined infrastructure for an application made out of radically different types of services, each running on multiple different kinds of infrastructures and possibly from multiple providers located all around the world?

Here again, solving security, network, monitoring and storage for Kubernetes is all well and good, but what about managing the legacy in VMs and on bare-metal? Many new-fangled tools for Kubernetes end where Kubernetes does, but applications don’t. Moreover, operations don’t and yet another tool for ops teams to learn and deploy creates additional burden instead of reducing it. Technology and tooling today must solve new cloud-native problems, but the best tools will solve the hardest problem today: operations simplicity. This can only happen by spanning the boundaries of new and old, and maintaining evolvability for the future.

Solutions for Operational Simplicity

By this point, it’s easy to imagine what’s coming next. Juniper has been successfully building high-performance, scalable systems for a long time.

Stay tuned for our next blog that is set to explore how Juniper brings operational simplicity to Kubernetes users and beyond. In the meantime, find out more about Juniper’s cloud-native solutions at juniper.net/cloud-native.

Serverless Containers Intensify Secure Networking Requirements / January 28, 2018 by James Kelly

When you’re off to the races with Kubernetes, the first order of business as a developer is figuring out a micro-services architecture and your DevOps pipeline to build pods. However, if you are the Kubernetes cluster I&O pro, also known as site reliability engineer (SRE), then your first order of business is figuring out Kubernetes itself, as the cluster becomes the computer for pod-packaged applications. One of the things a cluster SRE deals with—even for managed Kubernetes offerings like GKE—is the cluster infrastructure: servers, VMs or IaaS. These servers are known as the Kubernetes nodes.

The Pursuit of Efficiency

When you get into higher-order Kubernetes there are a few things you chase.

First, multi-purpose clusters can squeeze out more efficiency of the underlying server resources and your SRE time. By multi-purpose cluster, I mean running a multitude of applications, projects, teams/tenants, and devops pipeline stages (dev/test, build/bake, staging, production), all on the same cluster.

When you’re new to Kubernetes, such dimensions are often created on separate clusters, per project, per team, etc. As your K8s journey matures though, there is only so long you can ignore the waste this causes in the underlying server-resource capacity. Across your multicloud, consolidating many clusters into as few as practical for your reliability constraints also saves you time and less swivel-chairing for: patching, cluster upgrades, secrets and artifact distribution, compliance, monitoring, and more.

Second, there’s the constant chase of scaling efficiency. Kubernetes and active monitoring agents can help take care of auto-scaling individual micro-services, but scaling out assumes you have capacity in your cluster nodes. Especially if you’re running your cluster atop IaaS, it’s actually wasteful to maintain—and pay for—extra capacity in spare VM instances. You probably need some buffer because spinning up VMs is much slower than for containers and pods. Dynamically right-sizing your cluster is quite the predicament, particularly as it becomes more multi-purpose.

True CaaS: Serverless Containers

When it comes to right-sizing your cluster scale, while the cloud providers are happy to take your money for extra VMs powering your spare node capacity, they do have a better solution. At Re:Invent 2017, AWS announced Fargate to abstract away the servers underneath your cluster. Eventually it should support EKS in addition to ECS. In the meantime, Azure Container Instances (ACI) is a true Kubernetes-pods as a service offering that frees you from worrying about the server group upon which it’s running.

Higher-order Kubernetes SRE

While at Networking Field Day 17 (NFD17 video recording), I presented on “shifting left” your networking and security considerations to deal with DevOps and multi-purpose clusters. It turns out that on the same day, Software Engineering Daily released their Serverless Containers podcast. In listening to it you’ll realize that such serverless container stacks are probably the epitome of multi-purpose Kubernetes clusters.

What cloud providers offer in terms of separation of concerns with serverless container stacks, great cluster SREs will also aim to provide to the developers they support.

When you get to this level of maturity in Kubernetes operations, you’re thinking about a lot of things that you may not have originally considered. This happens in many areas, but certainly in networking and security. Hence me talking about “shift left,” so you can prepare to meet certain challenges that you otherwise wouldn’t see if you’re just getting Kubernetes up and running (great book by that name).

In the domain of open networking and security, there is no project that approaches the scalability and maturity of OpenContrail. You may have heard of the immortal moment, at least in the community, when AT&T chose it to run their 100+ clouds, some of enormous size. Riot Games has also blogged about how it underpins their DevOps and container runtime environments for League of Legends, one of the hugest online games around.

Cloud-grade Networking and Security

For cluster multi-tenancy, it goes without saying that it’s useful to have multi-tenant networking and security like OpenContrail provides. You can hack together isolation boundaries with access policies in simpler SDN systems (indeed, today, more popular due to their simplicity), but actually having multi-tenant domain and project isolation in your SDN system is far more elegant, scalable and sane. It’s a cleaner hierarchy to contain virtual network designs, IP address management, network policy and stateful security policy.

The other topic I covered at NFD17 is the goal of making networking and security more invisible to the cluster SRE and certainly to the developer, but providing plenty of control and visibility to the security and network reliability engineers (NREs) or NetOps/SecOps pros. OpenContrail helps here in two crucial ways.

First, virtual network overlays are a first-class concept and object. This is very useful for your DevOps pipeline because you can create exactly the same networking and security environment for your staging and production deployments (here’s how Riot does it). Landmines lurk when staging and production aren’t really the same, but with OpenContrail you can easily have exactly the same IP subnets, addresses, networking and security policies. This is impossible and impractical to do without overlays. You may also perceive that overlays are themselves a healthy separation of concerns from the underlay transport network. That’s true, and they easily enable you to use OpenContrail across the multicloud on any infrastructure. You can even nest OpenContrail inside of lower-layer OpenContrail overlays, although for OpenStack underlays, it provides ways to collapse such layers too.

Second, OpenContrail can secure applications on Kubernetes with better invisibility to your developers—and transparency to SecOps. Today, a CNI provider for Kubernetes implements pod connectivity and usually NetworkPolicy objects. OpenContrail does this too, and much more that other CNI providers cannot. But do you really want to require your developers to write Kubernetes NetworkPolicy objects to blacklist-whitelist the inter-micro-service access across application tiers, DevOps stages, namespaces, etc? I’d love to say security is shifting left into developers’ minds, and that they’ll get this right, but realistically when they have to write code, tests, fixes, documentation and more, why not take this off their plates? With OpenContrail you can easily implement security policies that are outside of Kubernetes and outside of the developers’ purview. I think that’s a good idea for the sanity of developers, but also to solve growing security complexity in multi-purpose clusters.

If you’ve made it this far, I hope you won’t be willfully blind to the Kubernetes SRE-fu you’ll need sooner or later. Definitely give OpenContrail a try for your K8s networking-security needs. The community has recently made it much more accessible to quick-start with Helm packaging, and the work continues to make day-1 as easy as possible. The Slack team is also helpful. The good news is that with the OpenContrail project very battle tested and going on 5 years old, your day-N should be smooth and steady.

PS. OpenContrail will soon be joining Linux Foundation Networking, and likely renamed, making this article a vestige of early SDN and cloud-native antiquity.

image credit alexandersonscc / Pixabay

Meet the Fockers! What the Fork? / September 1, 2016 by James Kelly

In the past week, all the rage (actual hacker rage in some forums) in the already hot area of containers is about forking Docker code, and I have to say, “what the fork?”

“What the Fork?”

Really!? Maybe it is because I work in IT infrastructure, but I believe standards and consolidation are good and useful...especially “down in the weeds,” so that we can focus on the bigger picture of new tech problems and the apps that directly add value. This blog installment summarizes Docker’s problems of the past year or so, introduces you to the “Fockers” as I call them, and points out an obvious solution that, strangely, very few folks are mentioning.

This reminds me of the debate about overlay networking protocols a few years back – only then we didn’t blow up Twitter doing it. This is a bigger deal than that of overlay protocols however, so my friends in networking may relate this to deciding upon the packet standard (a laugh about the joy of standards). We have IP today, but it wasn’t always so pervasive.

Docker is actually decently pervasive today as far as containers go, so if you’re new to this topic, you might wonder where the problem is. Quick recap… The first thing you should know is that Docker is not one thing, even though we techies talk about it colloquially like this. There’s Docker Inc., Docker’s open source github code and the community and Docker products. Most notably, Docker Engine is usually just “Docker,” and even that is actually a cli tool and a daemon (background process). The core tech that made Docker famous is a packaging, distribution, and execution wrapper over Linux containers that is very developer and ops friendly. It was genius, open source, and everyone fell in love. There were cheers of “lightning in a bottle,” and “VMs are the new legacy!” Even Microsoft has had a whale of a time joining the Docker ecosystem more and more.

Containers and good tech around them has absolutely let tech move faster, transforming CI/CD, devops, cloud computing, and NFV, well…it hasn’t quite hit NFV absolutely yet, but that’s a different blog.

Docker wasn’t perfect, but when they delivered v1.0 (summer 2014), it was very successful. But then something happened. Our darling started to make promises she didn’t keep and worse, she got fat. No not literally, but Docker received a lot more VC funding, and grew organically and with a few acquisitions. This product growth took them away from just running and wrapping containers into orchestration, analytics, networking etc. (Hmm. Reminds me of VMware actually. Was that successful for them? :/) Growing adjacent or outside of core competencies is always difficult for businesses, let alone doing it with the pressure of having taken VC funding and expected to multiply it. At the same time, as Docker started releasing more features and more frequently, quality and performance suffered.

By and large I expect that most folks using Docker, weren’t using it with Docker Inc. nor anyone’s support, but nonetheless, with Docker dissatisfaction, boycotts and breakups starting to surface, in came some alternatives: fairly new entrants Core OS and Rancher, with more support from other traditional Linux vendors Red Hat (Project Atomic) and Canonical (LXD).

Containers, being fundamentally enabled by the Linux kernel, all Linux vendors have some stake in seeing that containers remain successful and proliferate. Enter OCP – wait that name is taken – OCI, I mean OCI. OCI aims to standardize a container specification: What goes in the container “packet” both as an on-disk and portable file format and what it means to a runtime. If we can all agree on that, then we can innovate around it. There is also a reference implementation, runC, donated by Docker, and I believe that as of Docker Engine v1.11 it adheres to the specification. However, according to the OCI site, the spec is still in progress, which I interpret as a moving target.

You may imagine the differing opinions at the OCI table (and thrown across blogs and Twitter – more below if you’re in the mood for some tech drama) and otherwise just the slow progress has some folks wanting to fork Docker so we can quickly get to a standard.

But wait there’s more! In Docker v1.12, near-released and announced at DockerCon June 2016, Docker Inc. refactored their largest tool, called Swarm, into the Docker Engine core product. When Docker made acquisitions like SocketPlane that competed with the container ecosystem, it stepped on the toes of other vendors, but they have a marketing mantra to prove they’re not at all malevolent: “Batteries included, but replaceable.” Ok fine we’ll overlook some indiscretions. But what about bundling the Swarm orchestration tools with Docker Engine? Those aren’t batteries. That’s bundling an airline with the sale of the airplanes.

Swarm is competing with Kubernetes, Mesosphere and Nomad for container orchestration (BTW. Kubernetes and Kubernetes-based PaaSs are the most popular currently), and this Docker move appears to be a ploy to force feed Docker users with Swarm whether they want it or not. Of course they don’t have to use it, but it is grossly excessive to have around if not used, not to mention detrimental to quality, time-to-deploy, and security. For many techies, aside from being too smart to have the Swarm pulled over their eyes, this was the last straw for another technical reason: the philosophy that decomposition into simple building blocks is good, and solutions should be loosely coupled integrations of these building blocks with clean, ideally standardized, interfaces.

Meet the Fockers! So, what do you when you are using Docker, but you’re increasingly are at odds with the way that Docker Inc. uses its open source project as a development ground for a monolithic product, a product you mostly don’t want? Fork them!

Especially among hard-core open source infrastructure users, people have a distaste for proprietary appliances, and Docker Engine is starting to look like one. All-in-ones are usually not so all-in-wonderful. That’s because functions built as decomposed units can proceed on their own innovation paths with more focus and thus, more often than not, beat the same function in an all-in-one. Of course, I’m not saying all-in-ones aren’t right for some, but they’re not for those that demand the best, nor for those that want choice to change components instead of getting locked into a monolith.

“All-in-ones are usually not so all-in-wonderful.”

Taking all this into account, all the talk of forking Docker seems bothersome to me for a few reasons.

First of all there are already over 10000 forks of Docker. Hello! Forking on github is click-button easy, and many have done it. As many have said, forking = fragmentation, and makes it harder for those that need to use and support multiple forks or test integrations with them aka the container ecosystem of users and vendors.

Second, creating a fork, someone presumably wants to change their fork (BTW non-techies, that’s the new copy that you now own – thanks for reading). I haven’t seen anybody actually comment on what they would change if they forked the Docker code. All this discussion and no prescription :( Presumably you would fix bugs that affect you or try to migrate fixes that others have developed, including Docker Inc who will obviously continue to develop them. What else would you do? Probably scrap Swarm because most of you (in discussions) seem to be Kubernetes fans (me too). Still, in truth you have to remove and split out a lot more if you really want to decompose this into its basic functions. That’s not trivial.

Third, let’s say there is “the” fork. Not my fork or yours, but the royal fork. Who starts this? Who do you think should own it? Maybe it would be best for Docker if they just did this! They could re-split out Swarm and other tools too.

“Don’t fork, make love. ”

Finally, and my preferred solution. Don’t fork, make love. In seriousness, what I mean is two things:

1. There is a standards body, the OCI. Make sure Docker has a seat at the table, and get on with it! If they’re not willing to play ball, then let it be widely known, and move OCI forward without them. I would think they have the most to lose here, so they would cooperate. The backlash if they didn’t may be unforgiving.

2. There is already a great compliant alternative to Docker today that few are talking about: rkt (pronounced Rocket). This is pioneered by CoreOS, and now supported on many Linux operating systems. rkt is very lightweight. It has no accompanying daemon. It is a single, small, simple function that helps start containers. Therefore, it passes the test of a small decomposed function. Even better, the most likely Fockers are probably not Swarm users, and all other orchestration systems support rkt: Kubernetes’s rktnetes, Mesos’s unified containerizer, and Nomad’s rkt driver. We all have a vote, with our wallet and free choices, so maybe it is time rock it on over to rkt.

In closing, I offer some more ecosystem observations:

In spite of this criticism (much is not mine in fact, but that of others to which I’m witness) I’m a big fan of Docker and its people. I’m not a fan of recent strategy to include Swarm into core however. I believe it goes against the generally accepted principles of computer science and Unix. IMO Docker Engine was already too big pre-v1.12. Take it out, and compete on the battle ground of orchestration on your own merits and find new business models (like Docker Cloud).

I feel like the orchestration piece is important and the momentum is with Kubernetes. I see the orchestration race a little like OpenStack vs. Cloud Stack vs. Eucalyptus 5 years ago where the momentum then was with OpenStack, and today it’s with Kubernetes but for different reasons. It has well-designed and cleanly defined interfaces and abstractions. It is very easy to try and learn. Those who say it is hard to learn, setup and use, what was your experience with OpenStack? Moreover, there are tons of vanilla Kubernetes users; can’t say that about OpenStack. In the OpenStack space, there are OpenStack vendors adding value with support and getting to day-1, but with Kubernetes the vendor space is adding features on top of Kubernetes, moving toward PaaS. That’s a good sign that the Kubernetes scope is about right.

I’ve seen 99% Docker love and 1% anti-Docker sentiment in my own experience. My employer, Juniper, is a Docker partner and uses Docker Engine. We have SDN and switching integrations for Docker Engine, and so personally, my hope is that Docker finds a way through this difficulty by re-evaluating how it is using open source and business together. Open source is not a vehicle to drive business nor shareholder value which is inherently competitive. Open source is a way to drive value to the IT community at large in a way that is intentionally cooperative.

https://medium.com/@bob_48171/an-ode-to-boring-creating-open-and-stable-container-world-4a7a39971443#.yco4sgiu2

http://thenewstack.io/container-format-dispute-twitter-shows-disparities-docker-community/

https://www.linkedin.com/pulse/forking-docker-daniel-riek