Event Management and Telemetry monitoring is complex in even the simplest of environments. But in a multi-cloud world that complexity is multiplied. The ephemeral nature of many services (or the scale up and down of the use of those services), as well as the more flexible infrastructure, pressure the standards of yore. Syslog, for instance, was written for a simpler age; an age of static ‘pet’ infrastructure, where a server going down meant a service outage. Sysdig Monitor (product page) provides an event monitoring and telemetry monitoring tool for the modern age. Today’s modern applications are built with resilience in mind and the loss of a single node or server, although a cause for some triage, is no longer the cause of an alert or paging a human to take a look immediately.
This however, does not mean that there is no longer a need to monitor the environment. The truth is as the world becomes more abstracted with the supposed simplification; the fact is that environments are much more complex. Managing the flow of monitoring telemetry in this environment is more important today than ever before: it is very unlikely that any human can successfully monitor the fast moving infrastructure of today and the next generation infrastructure where Kubernetes is creating and destroying containers at the speed of need. This is where next generation monitors come in to play.
Sysdig provides next generation monitoring for your container environments. It offers a multi-faceted approach to monitoring, with telemetry and health checking to cover the full stack and all of this is available from a single pane of glass.
The product combines a rich Docker and Kubernetes monitoring system with deep metrics on the underlying containers. The addition of Prometheus monitoring to aid application visibility, alerting and to provide a helping hand with trouble shooting is another compelling argument for the product.
We’ll dive into the Monitor product and look at these seven areas
- Full-stack data
Sysdig Full-Stack Data
Full stack monitoring provides monitoring of metrics across all of your Kubernetes infrastructure: applications, network, physical hosts and file systems. It provides insight into your deployment health and performance. The ability to see metrics across the breadth of your deployment is a powerful tool.
At the infrastructure layer you can view your entire, grouped by cluster:
The following graphic shows the health of your overall Kubernetes installation which nodes are pushing their limits, how many containers a node is running etc
Moving up the stack you can monitor the actual applications that are running within the containers. This graphic shows an overview of your Cassandra environment, standard queries also allow you to view by node.
The performance insights include response time, latency, request count, and error count.
Sysdig Monitor out of the box can auto-detect several applications and can provide instant dashboards and metric views to monitor common applications and services like Cassandra, HAproxy, Istio, MongoDB, MySQL, NGINX.
Out of the box you can obtain insight into your network connections including ingress and egress for any process, container, service and pod. Monitor and troubleshoot link performance and bottlenecks to solve network issues and ensure app service levels.
And that is just out of the box with default settings. Start digging into customization a little bit and you can start to collect Prometheus, StatsD and JMX metrics from every app and container without server endpoints or complex configs. Couple this with the ability to aggregate data by microservice or cluster on the fly and say you can see this is a powerful solution but in the parlance of the Shopping channels “that’s not all!”
Prometheus is an open source monitoring platform that provides a scalable multi-dimensional data model coupled with a powerful query language that is commonly utilized with Kubernetes and Container deployments to handle alerts and monitoring, at first glance this may seem counter intuitive. Sysdig and Prometheus appear be direct competitors, this however is not the case.
Sysdig monitor can read native Prometheus metrics and fully understands exporters, but layers on to that output further metrics like exports in JMX, enterprise class data management, scale, integrations, and support. So it is native Prometheus for your development environments and use Sysdig Monitor for the more complex environments of Staging and Production where typically DevOps or Platform Ops is responsible for software; or where there is a mix of custom metric types; there is a desire for troubleshooting as well as monitoring data; and there are requirements around long term data retention, management, and user access controls.
Sysdig then presents this data in a single dashboard for consumption rather that having to utilise multiple access points. Where Sysdig started to pull ahead of native Prometheus is that it requires you to configure an exporter per pod, this is a dedicated container, this obviously increased management overheads and add another layer of complexity, and any extra complexity is another pain point for Operational staff. By contrast Sysdig Monitoring is host-based rather then container or pod driven, thus the amount of wasted resources is minimized and design complexity lowered.
With the introduction of Kubernetes into your infrastructure, traffic flows become more complex. With Sysdig monitoring you can visualize your infrastructure and services using topology maps. See traffic flows, identify bottlenecks, and understand interdependencies to streamline microservice management.
Dashboards are your window into the infrastructure. They are a well know paradigm for presenting consolidated data in a useful format for at-a-glance consumption. Sysdig Monitoring provides over 75 out-of-the-box dashboards to get you started on your journey, however, it is the abitlity to customize multiple inputs from several sources in a single interface, to enable you to present your data in a way that is pertinent to your needs where the power of dashboards shine brightest. This coupled with the ability to share them between team members and even publicly via read-only URLs, awesome to provide real-time metrics to your customers and clients.
Metrics are all well and good but the power is in what you do with them. Alerting provides instant responses to anomalies. This can be configured across nodes, namespaces, cluster even individual metrics and tags. Once an alert has been triggered notifications can be sent out to your incident management tools to automatically raise a ticket and kick in that workflow, change the level of monitoring to capture deeper levels of information for troubleshooting or even trigger an automation task to expand or contract or restart the offending instance.
Good Troubleshooting is a skill to be worshipped, you would think that with all this information that is available for consumption, however often this information is not in a format that is easily consumed. Sysdig Monitor provides deep insights at the system call level to enable more effective troubleshooting by visualizing outputs it a readable format.
Traffic light indicators allow quick zeroing on problem areas, in a time when DevOps, Development, and Operations teams are swamped in data, making sense of it is becoming harder and harder, and this is the case even with pretty data analytical tools, with the move to microservices and containers has only made this information overload greater.
Deep insight allows troubleshooting of containers even after the errant container has been destroyed.
Sysdig teams are a service based access control, this allow for role based access to dashboards and information that is pertinent to your role, rather than being deluged in information overload.
Service based access control is a concept that is similar to Role based access control. This allows you to define your dashboard based on your business need rather than a sanitized view of your business defined by the vendor rather than your business organization.
For example a dashboard that monitors your database containers for your DBAs a different one for the developers responsible for a particular application. This would allow different logic based on what the particular team deemed their most important metrics. Next your platform operations teams require a more holistic overview and they have completely different metrics their dashboards have a completely different look and feel.
Sysdig teams is new functionality that allows the organization of users into teams, to enforce data access security policies and improve troubleshooting workflow. All teams are isolated from each other thus limiting the exposure of dashboards and alerts, and control access to infrastructure, service, and application performance data. They simplify the process of a user getting from system login to the data they need right now.
Once configured Teams dynamically filters your metrics based on metadata already present in your system. Sysdig already retrieves metadata from your orchestration system to aggregate docker container monitoring data into views for your deployments, services, and tasks.
Monitoring an environment that utilizes Kubernetes to orchestrate a container environment in multi-cloud world is complicated, and we need new tools to solve the issues of vastly increased data inputs. The old monolithic monitoring solutions are just not up to the task; any container monitoring solution will just be a bolt-on to an already creaking monolithic architecture and will not be able to cope. Sysdig Monitoring is one of a new generation of monitoring solutions built with cloud-native needs in mind. The features that are available are all geared towards monitoring in a DevOps world, where containers are at their core. It is not a panacea for everything but for Containers and Kubernetes it’s da’bomb.