Serverless Containers Intensify Secure Networking Requirements / January 28, 2018 by James Kelly

When you’re off to the races with Kubernetes, the first order of business as a developer is figuring out a micro-services architecture and your DevOps pipeline to build pods. However, if you are the Kubernetes cluster I&O pro, also known as site reliability engineer (SRE), then your first order of business is figuring out Kubernetes itself, as the cluster becomes the computer for pod-packaged applications. One of the things a cluster SRE deals with—even for managed Kubernetes offerings like GKE—is the cluster infrastructure: servers, VMs or IaaS. These servers are known as the Kubernetes nodes.

The Pursuit of Efficiency

When you get into higher-order Kubernetes there are a few things you chase.

First, multi-purpose clusters can squeeze out more efficiency of the underlying server resources and your SRE time. By multi-purpose cluster, I mean running a multitude of applications, projects, teams/tenants, and devops pipeline stages (dev/test, build/bake, staging, production), all on the same cluster.

When you’re new to Kubernetes, such dimensions are often created on separate clusters, per project, per team, etc. As your K8s journey matures though, there is only so long you can ignore the waste this causes in the underlying server-resource capacity. Across your multicloud, consolidating many clusters into as few as practical for your reliability constraints also saves you time and less swivel-chairing for: patching, cluster upgrades, secrets and artifact distribution, compliance, monitoring, and more.

Second, there’s the constant chase of scaling efficiency. Kubernetes and active monitoring agents can help take care of auto-scaling individual micro-services, but scaling out assumes you have capacity in your cluster nodes. Especially if you’re running your cluster atop IaaS, it’s actually wasteful to maintain—and pay for—extra capacity in spare VM instances. You probably need some buffer because spinning up VMs is much slower than for containers and pods. Dynamically right-sizing your cluster is quite the predicament, particularly as it becomes more multi-purpose.

True CaaS: Serverless Containers

When it comes to right-sizing your cluster scale, while the cloud providers are happy to take your money for extra VMs powering your spare node capacity, they do have a better solution. At Re:Invent 2017, AWS announced Fargate to abstract away the servers underneath your cluster. Eventually it should support EKS in addition to ECS. In the meantime, Azure Container Instances (ACI) is a true Kubernetes-pods as a service offering that frees you from worrying about the server group upon which it’s running.

Higher-order Kubernetes SRE

While at Networking Field Day 17 (NFD17 video recording), I presented on “shifting left” your networking and security considerations to deal with DevOps and multi-purpose clusters. It turns out that on the same day, Software Engineering Daily released their Serverless Containers podcast. In listening to it you’ll realize that such serverless container stacks are probably the epitome of multi-purpose Kubernetes clusters.

What cloud providers offer in terms of separation of concerns with serverless container stacks, great cluster SREs will also aim to provide to the developers they support.

When you get to this level of maturity in Kubernetes operations, you’re thinking about a lot of things that you may not have originally considered. This happens in many areas, but certainly in networking and security. Hence me talking about “shift left,” so you can prepare to meet certain challenges that you otherwise wouldn’t see if you’re just getting Kubernetes up and running (great book by that name).

In the domain of open networking and security, there is no project that approaches the scalability and maturity of OpenContrail. You may have heard of the immortal moment, at least in the community, when AT&T chose it to run their 100+ clouds, some of enormous size. Riot Games has also blogged about how it underpins their DevOps and container runtime environments for League of Legends, one of the hugest online games around.

Cloud-grade Networking and Security

For cluster multi-tenancy, it goes without saying that it’s useful to have multi-tenant networking and security like OpenContrail provides. You can hack together isolation boundaries with access policies in simpler SDN systems (indeed, today, more popular due to their simplicity), but actually having multi-tenant domain and project isolation in your SDN system is far more elegant, scalable and sane. It’s a cleaner hierarchy to contain virtual network designs, IP address management, network policy and stateful security policy.

The other topic I covered at NFD17 is the goal of making networking and security more invisible to the cluster SRE and certainly to the developer, but providing plenty of control and visibility to the security and network reliability engineers (NREs) or NetOps/SecOps pros. OpenContrail helps here in two crucial ways.

First, virtual network overlays are a first-class concept and object. This is very useful for your DevOps pipeline because you can create exactly the same networking and security environment for your staging and production deployments (here’s how Riot does it). Landmines lurk when staging and production aren’t really the same, but with OpenContrail you can easily have exactly the same IP subnets, addresses, networking and security policies. This is impossible and impractical to do without overlays. You may also perceive that overlays are themselves a healthy separation of concerns from the underlay transport network. That’s true, and they easily enable you to use OpenContrail across the multicloud on any infrastructure. You can even nest OpenContrail inside of lower-layer OpenContrail overlays, although for OpenStack underlays, it provides ways to collapse such layers too.

Second, OpenContrail can secure applications on Kubernetes with better invisibility to your developers—and transparency to SecOps. Today, a CNI provider for Kubernetes implements pod connectivity and usually NetworkPolicy objects. OpenContrail does this too, and much more that other CNI providers cannot. But do you really want to require your developers to write Kubernetes NetworkPolicy objects to blacklist-whitelist the inter-micro-service access across application tiers, DevOps stages, namespaces, etc? I’d love to say security is shifting left into developers’ minds, and that they’ll get this right, but realistically when they have to write code, tests, fixes, documentation and more, why not take this off their plates? With OpenContrail you can easily implement security policies that are outside of Kubernetes and outside of the developers’ purview. I think that’s a good idea for the sanity of developers, but also to solve growing security complexity in multi-purpose clusters.

If you’ve made it this far, I hope you won’t be willfully blind to the Kubernetes SRE-fu you’ll need sooner or later. Definitely give OpenContrail a try for your K8s networking-security needs. The community has recently made it much more accessible to quick-start with Helm packaging, and the work continues to make day-1 as easy as possible. The Slack team is also helpful. The good news is that with the OpenContrail project very battle tested and going on 5 years old, your day-N should be smooth and steady.

PS. OpenContrail will soon be joining Linux Foundation Networking, and likely renamed, making this article a vestige of early SDN and cloud-native antiquity.

image credit alexandersonscc / Pixabay

#SimpleAF Networking in 2018 / December 20, 2017 by James Kelly

Last week at its flagship customer event, NXTWORK, Juniper Networks set a valiant vision for its role in the future of networking: “Engineering. Simplicity.” Next week as we take respite from engineering and set some 2018 goals, simplicity sounds good. Here are some of my ideas inspired by engineering for simplicity and Juniper’s new #simpleAF tag. Simple as fishsticks…of course.

Da Vinci called simplicity: ultimate sophistication. It would come more easily to those solving more primitive challenges, but maybe you, like Juniper, audaciously tackle cloud-grade problems, and in such domains and at such scale, simplicity is anything but simple to achieve.

The thing about creating #simpleAF simplicity is that it’s not just about tools or products. If you’re a network operator, another tool won’t make a revolutionary nor lasting impact toward simplifying your life any more than the momentary joy of a holiday gift. Big impacts and life changes start inside out. They don’t happen have-do-be, rather they are be-do-have. Juniper is doing its own work to put simplicity at its company’s core being, but this article, besides some gratuitous predictions, is about transforming your network operations, putting simplicity at your core for a happier prosperous 2018.

Be… the NetOps Change

To be the engineer of network simplicity, it’s time to lose the title of network admin, operator or architect and embrace a new identity as a:

Network Reliability Engineer

Just like sysadmins have graduated from technicians to technologists as SREs, the NRE title is a declaration of a new culture and serves as the zenith for all that we do and have as engineers of network invincibility. And where could invincibility be more important than at the base of the infrastructure that connects everything.

Start with this bold title, and fake it until you make it who you truly are.

Do… It Like They Do on the DevOps Channel

Just like SREs describe their practices as DevOps, network reliability engineering embraces DevNetOps.

While Dev and (app) Ops are working closely together atop cloud-native infrastructures like Kubernetes, the cluster SRE is the crucial ops role that creates operational simplicity by design of separation of concerns. Similarly, the NRE can design simplicity by offering up an API contract to the network for its consumers—probably the IaaS and cluster SREs in fact.

The lesson here is that boundaries are healthy. Separate concerns by APIs and SLA contracts to deliver simplicity to yourself as well as up the chain, whether that is to another overlay network or another kind of orchestration system.

It’s important that your foundational network layers achieve simplicity and elegance too. Trying to put an abstraction layer around and over top of a hot mess is a gift to no one, least of all yourself if it’s your mess.

So how do we clean up the painful mess that is the job of operating networks today? I’ve examined borrowing the good habits of SREs and DevOps pros before, in the shape of DevNetOps blogs and slideshares. Here is a quick recap of good behaviors to move you from the naughty to the nice list, along with my predictions for 2018.

1. Start with Networks as Code
Prediction: When people say the network CLI is dead, they jump straight to thinking about GUIs. Well for provisioning changes, you ought to start thinking about Eclipse Che or an IDE instead of product GUIs, and start thinking about GUIs more for observability and management by metrics. Networks as code start with good coding logistics. This coming year, I think we’ll see DevNetOpserati practice this simple truth and realize the harder one that networks as code is better on top of automated networking systems themselves. In other words, networking and config as code belong on top of SDN “intent-driven” systems, not box-by-box configurations in your github repo; nevertheless, that may be a good step depending on where you’re at in your journey.

2. Orchestrate Your Timeline as a Pipeline
Prediction: It follows from coding habits that you follow a review and testing process for continuous integration (CI). While vendors dabble in testing automation services and simulation tools, I predict that we will see more of a focus on these in 2018 and opinionated tools that orchestrate the process workflow of CI and eventually continuous delivery/deployment (CD). This is whitespace for vendor offerings right now, and the task is ripe for truly upleveling operator simplicity with process innovation. While the industry talks about automation systems like event-driven infrastructure (EDI) and continuous response, mature DevOps tooling is ready and waiting for DevNetOps CI process pipeline automation. Furthermore, it’s more accessible in terms of codifying or programming with a higher reliability ROI.

3. Micro and Immutable Architecture
Prediction: 2017 was the year everyone went koo koo for Kubernetes and containers. It has mainstream adoption for many kinds of applications, but networking systems from OSs to SDNs to VNFs are all lagging on the curve of refactoring into containers. I’ve reported on how finer-grained architectures are the most felicitous for DevOps, DevNetOps, and the software and hardware transformation that networking needs in order to achieve the agility of CD. We must properly architect before we automate. We’ve started to see containers hit some SDN systems in 2017, but I predict 2018 will be the year we start to see VNF waypoints as containers with multi-interface support in CNI and a maturation of higher-order networking in the ruling orchestrator, Kubernetes.

4. Orchestrated CD from Your Pipeline
Prediction: I’m doubtful that we’ll hit this mark in 2018 in the area of DevNetOps, mostly because of some of the above prerequisites. On the flip side, it’s obvious that in 2018 we are going to see the network play a big role in micro- services CD thanks to the rise of service meshes that do a much better job of canary and blue-green deployments. CD for actual networking systems is likely experimental in 2018 or bleeding edge for some SDN systems.

5. Resiliency Design and Drills
Prediction: This is the fun practice of chaos engineering, designing and automating around failures to stave off black-swan catastrophes. This design is already showing up in preference for more smaller boxes instead of fewer larger boxes. Most enterprises are also getting around to SD-WAN to use simultaneous hybrid WAN connectivity options. There is more to do here in terms of tooling and testing drills in CI pipelines, as well as embracing the “sadistic” side of the NRE culture that kills things for fun to measure and plan for invincibility or automatic recovery. In 2018, I think this will continue to focus on evolving availability designs until we make more progress in areas 2 and 3 above.

6. Continuous Measurement
Prediction: The SRE and NRE culture is one of management by metrics; thus, analytics are imperative. There is always progress happening in the area of telemetry and analytics, and I know at Juniper we continue to push forward OpenNTI, JTI and AppFormix. 2018 beckons with the opportunity to do more with collection systems like Prometheus and AppFormix, employing narrow-AI deep learning for the first time to raise new insights beyond normal human observation.

7. Continuous Response
Prediction: With the 2017 rage around intent-based or -driven networking, there is sure to be progress in 2018. While the past few years have focused on EDI, I think the most useful EDI actions are largely poised to get subsumed into SDN systems with controllers in them, similar to how Kubernetes’ controllers implement the continuous observation and correction to the declared state. There is however a tribe of NREs that look to automate across networking systems and other IT systems like incident management, ticketing and ChatOps tools. As I wrote about in May, I think the maturing serverless or FaaS space will eventually win as the right platform for these custom EDI actions.

Have… a Happier Holiday

As an NRE, it’s not just about doing DevNetOps behaviors or processes nor is it about having the greatest tools or code. When you’re network reliability engineering for simplicity, it’s equally about what’s really important when you take the NRE cape off and go home: not getting called in and enjoying a happier holiday. And so, simplicity is what I wish for you this holiday season, and for the next one, may you be further down the road of engineering simplicity.

image credit Manuchi/Pixabay

Forging DevOps Culture with Hedge-fund Flair / October 28, 2017 by James Kelly

teamwork-culture shutterstock_506137132.jpg

People: your most important resource and your greatest predicament to DevOps potency.

When the DevOps consultants recess and you need to scale a pilot-project team’s savvy, how do you affect the wider organization with DevOps principles?

Balancing the ingredients of this so-called mentality is trickier than revamping tools and processes. We all know to let tooling lead thy process, and process lead thy tooling. We know the approach is a rolling upgrade, not a mass reboot.

But in the plethora chapter and verse on DevOps, cultural principles are still parsimonious—not another definition, nor “automate everything,” nor the trite dev and ops working closely—real principles of cultural behaviors, their reasoning and an implementation track record.

When I was pouring through the pages of Principles by the Steve Jobs of investing, Ray Dalio, I was expecting to learn about life, finance and business from this famed hedge-fund investment and business guru. I did. I also realized, Ray’s high-performing investment and management principles codify common aspects of the DevOps mentality with some new ideas and revisions. And he’s got the CEO and CIO track record to support it, only his c-level ‘I’ stands for investment.

In the spirit of the ‘S’ for sharing in DevOps’s CALMS, Ray has provided a principles manifesto in clear, practical terms. I won’t reveal them all—I encourage you to read the book for that—but here are five of his greatest principles, distilled and steeped with my own perspective for the DevOps anthology.

1. Expedite Evolution, Not Perfection

From the opening biography, we come to know Ray as a continual learner by trial and error. He’s always looking for lessons in failures to carry forward, to do it better next time. He doesn’t regret failures; he values them more than successes because they provide learning.

Ray tells how he wouldn’t be where he is today—one of TIME’s top-100 most influential people in the world—if he had not hit rock bottom, having to let go of all his employees and forced to borrow $4000 from his dad to pay household bills until his family could sell their second car.

Because Ray upcycles painful mistakes into lessons and principles, learning and efficiency compound. He embraces evolutionary cycles, and knows a thing or two about compounding. Our human intelligence allows us to falter and adapt in rapid cycles that compound wisdom, without waiting for effects of generations. This iterative, rather than intellectual, approach performs better with the added benefit that, being experiential, you know it works.

If you’re a DevOps advocate, your Kaizen lightbulb may have lit. Kaizen is continuous learning: as I say, it’s the most important of all continuous practices in DevOps—and in life. Drawing from Ray’s rapid iteration of trial, error, reflect and learn, we see how he pairs Kaizen with Agile, values learning from failure, and takes many small quick steps for faster evolution.

To solidify the value behind this concept pairing, imagine a fixed savings interest rate, but change the cycle. What’s better: 12% annually or 1% monthly? “Periods do Matter” in this Investopedia article will show you that shorter cycles are better than longer ones. There is the technical reasoning behind why faster failing, leads to better evolution.

In another great read, 4 Seconds, Peter Bregman exemplifies how to manage learning and failure in business by telling the story of teaching his daughter to ride a bike without training wheels. Managing is knowing just the right time to step in and catch her. Too soon and she won’t learn to rebalance herself. Too late and...wipeout! He explains, “Learning to ride a bike, learning anything actually, isn't about doing it right: it's about doing it wrong and then adjusting. Learning isn't about being in balance, it's about recovering balance. And you can't recover balance if someone keeps you from losing balance in the first place.”

In summary, allow failure, cycle quickly and record the lessons. Depriving your people from the opportunity to fail, you deprive them from the opportunity to succeed—and the opportunity to improve. Breed a culture of rapid feedback and experimentation with guardrails, allowing failure without fatality.

2. Triangulate and Be Actively Open Minded

DevOps aficionados are familiar with “fail fast,” Agile and Kaizen. What’s further interesting about Ray, is how he allows for failure and equally reaches for high standards. And beyond technology, excellence is rarely discussed in DevOps circles.

Ray pursues life’s best. “You can have virtually anything you want, you just can’t have everything you want,” he says. Aside from his uncompromising principles in hiring and maintaining excellent people, Ray insists on excellent decision making to instill quality into evolution.

If failure doesn’t form progress, “fail fast” falls flat. Just like machine learning uses new and quality data to improve, our cycle progress is proportional to the quality and newness of abilities and information we use to pursue our goals.

The approach Ray hammers again and again is triangulation: exploring opinions different than his own or the first one offered up. Varying judgments can’t all be right, but understanding different viewpoints, is like making a quantum leap in an evolutionary cycle compared to learning from one source.

Ray’s dramatic story of receiving a cancer diagnosis indelibly impresses the importance of triangulation.

Obviously shaken up, he began to estate plan and spend more time with family, but he also consulted three experts. The first two doctors had wildly different prognoses and proposals for treatment or surgery. So he got them speaking with one another; they were respectful in understanding each other’s take, and Ray learned a lot. Finally, the third doctor suggested a regiment of no treatment nor surgery, but instead to monitor a biopsy of the cells every 6 months because his data showed treatments and surgery didn’t necessarily extend life in cases of cancer of the esophagus.

The three specialists, Ray and his family doctor agreed that this final approach wouldn’t hurt. The learning value or this triangulation aside, the outcome of the story will floor you: Ray’s first biopsy showed that he didn’t have any cancerous cells.

Back to DevOps, the CALMS ‘S’ for sharing is brilliant, but we can push beyond sharing. Actively seeking, not only sharing, information is key to boosting the quality of our decision making and evolution. Companies like Google do this with a manic focus on data, and data is just one avenue of information that may or may not go against our own beliefs.

In general, DevOps leaders must advocate for a culture and habits of active open mindedness, seeking opinions of other believable people and data. Like Ray, assertively explain your own opinions, while maintaining poise and humility to change your mind.

3. Radical Truth and Transparency

At the heart of Ray’s high-performing company, Bridgewater, is a culture of radical truth and transparency. Their patriarch trusts in truth, and loves his people like family, but also equitably protects the whole more than any part. For the greater good he doesn’t hold back in accurate evaluation, root-cause analysis, and openly pointing out problems, even in people. “Love the people you shoot,” he writes, “Tough love is both the hardest and most important kind of love to give.”

The firm keeps an internally available “baseball card” on each employee’s strengths and weaknesses synthesized from evidence-based patterns and a collection of business tools with psychographic-data crunching backends. Weaknesses aren’t misconstrued for weak people, and employees aren’t pigeonholed; the transparency enables orchestrating employees’ best work and identifying their believability in decision making.

For decades, Bridgewater has been using data on people and their track records to do believability-weighted decision making with the help of computers. The company’s “Idea Meritocracy” tools like the Dot Collector Matrix collect data and help teams make believability-weighted decisions, even instantly in meetings. While this was pioneered for investment decisions, Bridgewater later adopted the system for management decisions. Ray also hints he’s working toward offering the system as a service.

This principle is about being ruthless in demanding integrity, honesty, accuracy and openness. Common workplace biases like loyalty, niceness, confidentiality and secrecy might seem safe or well-intentioned in small contexts, but are ultimately self-defeating of the big-picture success of the whole.

Every person and organization has a unique twist on values and workplace politics, but while Bridgewater’s success speaks volumes, its radically straightforward approach is also reported to be the preference of techies and millennials that make up many DevOps-forward teams.

Embracing DevOps, results in more than dev and ops working together—it’s working more closely with the business too. While DevOps leaders can’t control the culture in the wider organization, they can shape the sub-culture of their own teams. Not only is it more manageable on that scale, but this cultural principle and corresponding tools seem a natural fit for IT workers. Just maybe as the role of IT is growing in most businesses today, the culture might catch on.

4. Be Candid and Fearless, Rather Than Blameless

Blameless post-mortem or retrospective meetings are not uncommon in a DevOps culture.

You can probably guess how Ray might see this differently.

If your culture is blameless, there’s less accountability, so you’re more likely to miss lessons and chances for improvement. It’s not just about fixing the machine neither, it’s about helping individuals. And if someone is truly not capable, you could fail to see it if you don’t dispassionately trace the blame.

Accuracy requires great diagnosis, and Ray’s method for root-cause analysis makes Toyota’s 5 Whys look skin deep.

Ray advocates to keep people responsible for investigation reporting up independently of where diagnosis happens, so there’s no fear of recrimination. “Remember people tend to be more defensive than self-critical. It is your job as a manager to get at truth and excellence, not to make people happy. Everyone's objective must be to get the best answers, not the answers that will make the most people happy.”

Having said that, Bridgewater’s culture also pushes everyone to tell the truth without fear of adverse consequences from admission of mistake.

When an employee missed placing a trade for a client, it ended up costing millions for younger, smaller, less-capitalized Bridgewater. With the whole company watching so to speak, Ray decided not to fire this employee—he knew that would lead to a culture of people hiding their mistakes instead of bringing them to light as soon as possible.

With respect to handling missteps, this hedge funder would admonish blamelessness in favor of candor and staff fearlessness. It has the efficiency of earlier learning and earlier redesign for prevention. It also doesn’t eschew accountability, encouraging individual improvement that eventually lifts the whole team.

5. Management by Machine and Metrics

Techies will appreciate how Ray talks about his business as a machine.

If you have great principles that guide you from your values to your day-to-day decisions, but don't have a way of making sure they're systematically applied, you leave their usefulness to chance. We need to cement culture into habits and help others do so as well. Systematizing any cultural principle into a process, tool, or both, “it typically takes twice as long,” Ray says, “but pays off many times over because the learning compounds.”

Bridgewater always put investment principles into algorithms and expert systems, and has long since run the rest of their business by software machinery as well.

Is this just the well-worn “automate everything” DevOps call?

Automation advances scale, performance, correctness, consistency and instrumentation. But high-performing businesses like Bridgewater also manage by metrics: they compare outcomes and measurements to goals.

While data-driven decision making is eminent these days, data-driven measurement and accountability is less common. We have KPIs, QBRs and performance reviews, but how many teams are consistently managed by metrics? We more easily look forward, than take an objective look in the mirror even though it’s critical for evolution.

Goal-setting zealots argue that goals must be measureable, and Ray’s advice takes it one step further: Don’t look at the numbers you have and adapt them to your needs. Instead, “start with the most important questions and come up with the metrics that will answer them,” he says. “Remember any single metric can mislead.” Furthermore, like big-data analytics, data garbage in equals information garbage out.

Be the Change

Ray also says, “An organization is the opposite of a building: it's foundation is at the top.”

But we all know stories of change percolating from all levels of organizations, communities and countries. If you’re not a CEO like Ray was, you can still make a meaningful difference bottom-up or managing your own team, leading by example.

You could simply publish your team’s principles, create a tool, or ignite behaviors you want to spread. Of the DevOps people-process-technology, people are your most important resource; so forge the principles of their operating systems: sharpen, tweak, prioritize and balance. With the transformation door open in your digital business and DevOps journey, there’s no better time to make an invaluable mark on culture—in IT and beyond.

image credit Jacob Lund/Shutterstock