Since Dev and Ops related activities blend together in a DevOps world, developers need to understand more than just coding their application. Their role also includes typical Ops work: packaging and provisioning environments, design monitoring solutions and responding to production incidents.
On the other hand, Operators are no longer ‘just’ infrastructure administrators. Since systems are designed and deployed using Infrastructure as Code techniques and since they need to be resilient to operate correctly in the cloud, the role of Operators becomes more about qualitative aspects like resilience, cost, removing and preventing technical debt and more.
Both of them need to work together to strive for the utmost secure environments. System hardening helps to make infrastructure (in the cloud) more secure. In this article we’ll explore what it is and help to get you started.
What is system hardening?
A lot of debate, discussions, and tools focus on the security of the application layer. Especially when deployed in the cloud, you need to do more. Hackers and other bad guys focus on your systems to find weaknesses in it. They explore the entire stack to search for a way to exploit it. Your runtime systems greatly contribute to your attack surface.
Think of open ports that should not be open in the first place, a lack of patching of the underlying Operating System. Another example is the container runtime environment: your containers can be secure, but if how runtime environment is not – your system is still vulnerable.
System hardening helps to make your environments more robust and more difficult to attack. It helps to reduce the attack surface thus also protecting your application and its data.
The rationale for DevOps teams
DevOps team members have a great number of tools to help them release their applications fast and relatively easy. One of the main goals of these tools is to lower the barrier to push these application components and configuration through CI/CD pipelines. However, in order to speed up, most of these tools do not treat security as a first-class citizen. Since DevOps teams are under a lot of pressure, their primary task focuses on the delivery of features, they tend to focus less on the security aspects.
Why “out of the box” is not good enough
Let’s look at some examples.
- Kubernetes has a concept called Role-Based Access Controls and it is enabled by default. If you don’t specify any roles, you work as an admin. This is undesirable. The same is true for Docker images. By default, they run as root, which is also a big security issue.
- A lot of vendors shout out loud that their solution is “enterprise-grade”, you should carefully analyze their offerings to make sure it adheres to your security standards and applies to your (internal) policies and regulations.
- Consider a software vendor who delivers you a set of unhardened AMIs to be used for your EC2 instances in Amazon. If these AMIs require an IAM role that can access all of your other cloud resources, this poses a great risk for you. You would apply a hardening technique to reduce this risk.
The way we think about security needs to change.
Reducing the attack surface is a key aspect of system hardening. The following list helps to give you an overview of how to achieve this. These are platform-independent and you can apply them both in an on-premise environment as well as in the cloud.
List of common actions
- As part of the “defense in depth” concept, configure a host-bast firewall besides the companies’ firewall. This helps to stop malicious traffic which might already reached your private subnets.
- Disable all default accounts if they are not needed. Change default passwords or configure strong passwords if they are not set at all. Avoid the usage of typical usernames which are easy to guess and create non-default usernames with strong passwords.
- Make sure logging and auditing are set up correctly. For example: don’t log sensitive data and use log rotation to avoid disks from becoming full.
- Remove the packages from your Operating System or container images that are not absolutely needed.
- Correct or set file and directory permissions for sensitive files. Narrow down who can access them.
- Close unneeded network ports and disable unneeded services.
- Isolate systems across different accounts (in AWS) and environments (Azure resource groups).
- And last but not least: encrypt all data on all systems using a strong encryption mechanism.
Becoming and keeping compliant with external regulatory requirements is a key aspect for certain organizations like the ones which are operating in the financial or medical industry. They often need to adhere to regulations such as PCI DSS or HIPAA. External auditors require them to demonstrate the policies and processes with regard to the handling of sensitive data. Traceability is a key aspect here. For these kinds of organizations, hardening is even more important. Often, the external regulations help to create a baseline for system hardening.
CISO departments of large enterprises can help you with system hardening. They have practical knowledge about the security of systems as well as information on compliance and regulations. Besides this, they also employ so-called “Red teams” which focus on ethical hacking in order to test out the security of your internal infrastructure and applications as well as the processes which apply in your organization.
Standards & guidelines
Extra help comes from standards and guidelines which are widely used by a lot of companies worldwide. Think of the following list to guide you in the right direction:
- Center for Internet and Security (CIS). This organization releases various CIS controls for major technology stacks such as Virtual Machines or containers. You can download them free of charge after registering yourself. Use the controls to analyze your systems to make an informed decision on the risks you want or need to take. The controls do not substitute a fixed checklist. It’s interesting to note that a lot of tools like Docker bench and Sysdig secure use the CIS controls for containers.
- Focusing on the people and process side of things, Cisco provides a comprehensive page to build and operate an effective Security Operations Center. They come up with a 5-phased approach to execute a number of steps for each phase.
- The National Institute of Standards and Technology (NIST) was founded more than a century ago. Since then, they have specialized in a number of topics which also includes cybersecurity and vulnerability management. Their website is full of news and publications on a variety of topics.
Other standards and guidelines come from Red Hat and Oracle to name a few. Of course they dedicate their standard and guidelines to their own products, but this is a good reference for your own systems.
System hardening should not be done once and then forgotten. Hardening needs to take place every time:
- When new regulations or compliance rule shows up
- A new vulnerability kicks in
- As soon as a software application requires a change to the underlying system
- When new exploits are discovered
Since a lot of these factors are triggered by external forces, it takes a while to get them implemented throughout a large organization. Sometimes processes take so long that the software for which they apply is already out of date. Yet another reason to automate the hardening processes.
Software vendors do not always offer support of their commercial software if you use your own hardened systems as a base to install their software. You need to negotiate with them and perhaps handover all of your hardening scripts to inform them about the expected behavior of your system. You’re lucky if they are willing to cooperate. Ideally, they acknowledge this as a way to improve their products, but only if they stay compatible with their other customers.
Keep your balance
Another common challenge is to find the constant balance between functionality and hardening restrictions which influences your system. Due to hardening practices, runtime errors can pop up at unexpected moments in time. They are difficult to trace sometimes. Execute hardening steps in small iterations and constantly test the new version of your scripts. Infrastructure as Code and automated tests are essential here. This takes time, so it’s wise to start with these processes early on.
The tricky aspect of patching packages is that is sometimes (read: often) resets your configuration to its default settings. Your hardening scripts need to be aware of this and don’t take any setting for granted.
Build hardened systems
Simply speaking there are two common hardening strategies that are widely used.
The first one is based on the concept of the “golden image” which acts as the single source for any system which uses this type of image. A number of simple steps and rules of thumb apply here:
- Download a standardized Operating System distribution and install it on a “fresh” machine that has no fancy drivers or other requirements.
- Install the needed packages and patches.
- From here, run your hardening scripts.
Hashicorp Packer can help you here. You need a single template to create an image for Docker, EC2 instances, Google Cloud, VMware, etc. Build once, run (almost) anywhere. This makes the tool so powerful.
To patch a system you need to update the template and build a new image. This image is then pushed out to your systems. As mentioned in the article immutable infrastructure, this helps to avoid technical debt.
Boot and run
The second strategy is quite the opposite of the first one. Using this approach, you typically follow these steps:
- Boot up your system
- Pull the hardening scripts and other code from your Git repository
- Run these scripts against your system. Tools like Chef and Puppet can help you here.
- Every X minutes the state of the system is checked against the scripts in the repository and then synced. This is to avoid configuration drift. It only works correctly if no one changes anything manually on your running systems. This is a strict rule to avoid snowflake servers from being created.
An interesting addition to this approach is the availability of compliance tests. Inspec is a free tool on Github which you can use to check the compliance rules of your systems. Example use cases are:
- Use Inspec for test-driven compliance.
- Write hardening assertions (assumptions written in code) that fail when a hardening script is not applied yet.
- Execute Inspec on a running machine and check the current state with the expected state. The check fails when these two are not the same.
Both strategies have pros and cons, be sure to choose what is applicable to your situation.
The role of business representatives
As always the business representatives play a vital role in these kinds of security topics. In the end, it’s the business that accepts or rejects a (security) risk. Given the fact that the cloud environment is “hostile by nature”, it leaves no doubt that hardening is an important aspect of your runtime systems.
One of the duties of the security folks is to inform the business in non-tech terms in which security concerns need to be overcome. Of course this is sometimes extremely difficult since a lot of topics are highly technical. Therefore the business representatives need to understand and highly trust the security guys.
Business representatives are required to free up time in the sprints to build and apply hardening scripts and to test out new “Golden Images” by the DevOps teams. Since the target platform is critical for the success of an application (in the cloud), this is not about technical debt.
System hardening helps to make your infrastructure more robust and less vulnerable to attacks. It aims to reduce the attack surface. There needs to be a good interplay of Ops and Developers to build and maintain hardened systems that also work correctly for various applications. Hardening scripts and templates can be built with tools like Chef, Ansible, Packer, and Inspec. Infrastructure as Code and automation is needed to constantly keep up with the everyday changes.