Compliance as Code initiatives helps organizations to become and stay compliant. In one of my previous posts, I highlighted this challenging topic and mainly focused on Open Policy Agent. Pod Security Policies help you when you run Kubernetes. This build-in feature is pretty easy to implement and use. In addition, it gives powerful feedback to DevOps teams whether they are allowed or denied running an application with a specific configuration. In short, they help you to keep your workloads compliant.
Why is it important
Since more and more enterprise-grade organizations run mission-critical applications on Kubernetes, it becomes an attractive target for hackers. They are very well informed about the complexity of Kubernetes and it’s default security settings (which are rather poorly implemented or forgotten at all). Companies need to protect their workloads since attacks can come from multiple directions. The most common directions are:
- Inside the cluster: escape a container and become a privileged user on the Worker Node. From there on, your cluster is wide open and even your precious data is at risk.
- Outside of the cluster: exploit a weakness in the cluster API which poses big risks for the entire cluster since all requests are routed through the cluster API.
Pod Security Policies (PSPs) provide a great way to protect your workloads. This comes on top of your other security mechanisms like RBAC, private endpoints, hardened Worker Nodes, etc. PSPs are like a fine-grained way to control what is allowed by whom and what should be blocked. Microsoft recently published a great article focused on the threat matrix for Kubernetes. It addressed 31 major attack factors. PSPs help to tackle almost 10 of them.
What are Pod Security Policies?
Simply said, PDSs intercept requests of any Pod which is newly created or altered. During this interception, the policies and rules which are defined by the PSPs are evaluated. Based on the outcome, the request is allowed or rejected.
How does it work?
First of all, you need to make sure Pod Security Policies are enabled for your cluster. A brief description per cloud provider:
- AKS: az aks update –resource-group aksResourceGroup –name demoAksCluster –enable-pod-security-policy
- AWS: All EKS Kubernetes clusters from v1.13 onwards have PSPs enabled by default. No action needed. Upgrade your clusters to 1.13+ to enable it.
- Google Cloud: gcloud beta container clusters describe $CLUSTER_NAME –zone $CLUSTER_ZONE | grep -A 1 podSecurityPolicyConfig to check if it is enabled. Create or update an existing cluster using the “–enable-pod-security-policy” flag to enable it.
As soon as you enabled PSPs on your cluster, you can deploy PSP resources. These resources are just like other Kubernetes resources so the manifest files are written in Yaml as well.
A sample Kubernetes resource of type “PodSecurityPolicy” is shown below:
apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: <name-of-the-policy> spec: <name-of-the-rule-to-deny-or-allow>
By default, policies apply to the entire cluster.
Role-Based Access Control
Using Role-Based Access Control (RBAC) you can define more fine-grained access control over where this policy should apply and where not. From a functional perspective, the following is needed:
- Create a Service Account for each application you want to be able to control. This is not required but helps you to keep controls dedicated to individual applications.
- Next, create a Role or Cluster Role to attach to the Service Account.
- Map the Service Account to a newly created Role or ClusterRole by creating a (Cluster)RoleBinding.
- Attach/associate the Pod Identity with the Service Account created in step 1. Use labels for that, just as with other resources like services and deployments. If no Service Account is defined, the default is used.
As soon as the PSP is deployed, the Kubernetes admission controller intercepts the requests and evaluates if the Pods (that uses the Service Account) can be scheduled on a worker node or not. Special note: the Role or ClusterRole you have defined in step 2 needs to have the “use-permission” for the PSP resource, otherwise the request won’t be intercepted.
Categories of compliance rules
Since there are a lot of compliance rules and policies, there are common categories. Most common are:
- Privileged: prevent or allow a Pod to run in “privileged mode” (e.g. drop all security requirements).
- Host Namespaces: related to “special area’s” on the Worker Node which the Pod has access to.
- Volumes and file systems: defines if and how Pods are allowed to use (persistent) volumes to store stateful data.
- FlexVolume drivers: defines which “FlexVolume” drivers are allowed to use. These are needed to use volumes for stateful data.
- Users and groups: controls the user and group which are allowed to be used by the Pod. Avoid running as the root user (see example policy below)
- Privilege escalation: explicitly allow or deny extra privileges gained by the Pod and/or container (e.g. widening the scope of permissions)
- Capabilities: answers the question if a container is allowed or denied to execute extra technical capabilities (e.g. setting the timezone on the Worker Node or read the IP-tables firewall rules)
- Linux security: (SELinux, AppArmor, Sysctl): various security settings mainly on the Virtual Machine level or on the container runtime environment. For example: allow specific proc mount types.
To make life a bit easier, the Kubernetes community has already defined 3 levels of policies:
- Privileged: basically no protection, only enabling of the PSPs.
- Default: prevent known escalations, but minimally restrictive
- Restricted: hardening is a key priority here
They provide a good baseline to start with.
Exceptions to the rules
The main problem with all of these policies is that you have to determine which rules are most important and which are not. Some capabilities (which, from the list above are flagged as security issues) are needed for specific workloads. Think of the IPC_LOCK capability which is needed for multi-process applications like MySQL. The NET_ADMIN capability needs to be enabled to capture network traffic. Furthermore, these policies might require exceptions to the rules if you want to use the above-mentioned features.
Security & Risk departments as well as compliance departments can and should help to determine the potential risks. To make things more complicated, all rules are pretty complicated and they might not understand the full impact of them. For this reason, the tech guys should help them out by together conducting a risk assessment on the rules per category.
Sometimes you need to annotate a Rolebinding or ClusterRolebinding to enable a specific rule in a category. This makes things less transparent since you don’t have a clear overview of rules and policies in one single place.
In the end, it’s about the potential impact when an attacker exploits a certain weakness. Besides this, the blast radius should be taken into account. They should be kept to a minimum while at the same time, keep things smooth so DevOps teams are not frustrated to push their business features. During the first phase of these policies, a trial-and-error approach are a good start.
Another consideration which is worthwhile to mention: PSPs do not exist on Windows-based systems, so those cannot be protected using the following examples:
It’s good to start creating specific “allow-policies” before you enable Pod Security Policies. If you don’t do so, not a single Pod can be scheduled and your cluster is completely broken.
Let’s make things a bit more practical now. The following examples are easy and give a big boost on the security and hardening aspects of your Kubernetes clusters. DevOps teams and security departments should welcome them.
Prevent privileged containers
Perhaps one of the most important policies: containers should not run in privileged mode. Simply speaking: privileged mode drops all security policies thus should be prevented whenever possible. You should do a very thorough analysis if you really require this.
apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: block-privileged-container spec: privileged: false
When this PSP is deployed, you can’t run a Pod like the following:
apiVersion: v1 kind: Pod metadata: name: block-privileged-container spec: containers: - name: pause image: nginx:1.17 securityContext: privileged: true
Don’t run as root
Closely related as the previous one, but not exactly the same: everyone within the Linux ecosystem knows that processes should not run as root when this is not absolutely needed. However, for containers, this is the default. Running as root must be prevented as much as possible. For most container images it is as simple as adding a non-root user and switch to that user to run the process inside the container as root.
Problems arise when the container is part of the Kubernetes systems itself. Think of CNI to implement networking capabilities and Pod identity in AKS. You need to create exceptions for these use cases. And if you do so, make sure you make them as specific as possible.
Read only filesystem
Containers should not modify their own filesystem or the filesystem of the worker node on which they run. This is especially true considering immutable infrastructure. In addition to that, it helps to prevent malicious processes from running and tampering with the containers in other ways.
apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: read-only-filesystem spec: readOnlyRootFilesystem: true
Grouping policies together
As seen in the previous examples, there is a lot of duplication and boilerplate code. Individual rules are good to try things out but grouping them together is more convenient and easier to maintain. This is as easy as having multiple “spec” elements in your Yaml file.
So what happens when there are multiple policies available?
- First: imagine a request to the API server which does not change a Pod (e.g. create or delete a Pod). The policy which successfully evaluates the request is applied first.
- Second: when a Pod is created, the first valid policy in alphabetical order evaluates the Pod.
- Last: if a Pod is updated and one of the policies is evaluated, it generates an error. This is because (certain) mutations are not allowed for running Pods.
Tools will help you
It is no surprise that cloud-native security tools jump in to help you with the task to set up and maintain PSPs. Sysdig launched an open-source tool called “Kube-psp-advisor” to scan your Kubernetes cluster and uses that as a reference model. It looks at different attributes (e.g. hostIPC, privileged, Volume, etc) of your current resources, and based on that it creates the recommended PSPs for you.
Kube PSP advisor has two important parameters:
- namespace: to select the namespace for the PSP. If not specified, the policy applies for the entire cluster.
- report: shows the current resources which use a Kubernetes security-related feature. Based on this, the PSP is created. It’s a bit like reverse engineering the PSP based on actual deployments. You can use this feature as a “dry-run” method before you actually create the PSP.
Rancher, the famous container platform also offers help. You can create default PSPs as well as customized ones. This can be done using the Graphical User Interface. In here, you can also switch policies on and off on the fly and bind them to specific pods in a given namespace.
Open Policy Agent can also handle Pod Security Policies. You need to write the policies in the Rego programming language. OPA compiles the policies and intercepts the requests just like Kubernetes would do it.
A special note on Azure and AKS
Microsoft quite recently announced that Azure Policy will take over Pod Security Policy. Currently, Azure Policy is in preview mode. Starting the 15th of October it will go into effect. Pod Security Policies for AKS will be unsupported by then if you want to upgrade your existing clusters. Please consult the official Microsoft documentation on how to enable and use Azure Policy for AKS.
Pod Security Policies help you to improve the security of your Kubernetes workloads. All major cloud providers support it (out of the box). Policies are written like regular Kubernetes resources and they are easy to setup. However, spend some time with the DevOps teams and the security department to handle the exceptions. I hope this article has inspired you to add PSPs to your clusters so you are even better protected.