Find your template, checklist or other download to help you in you tasks.
Find your way to be trained or even get certified in TMAP.
Start typing keywords to search the site. Press enter to submit.
Monitoring is very important, especially in DevOps. We not only monitor the system with all his characteristics, we also monitor team behavior to support improvements.
Areas of monitoring are:
In modern IT systems, knowing what is going on in your system is crucial. Is the IT system still behaving according to its expectations? IT systems consist of multiple components that need to interface and cooperate to work correctly and deliver the pursued business value.To create this insight, logging, tracing, and metrics should be in place throughout the IT system, including its infrastructure components.
The term observability is a container concept for various subjects around the state of an IT system. Logging, tracing, and metrics are the cornerstones of observability. The information they provide is combined to create a coherent view of the state of the IT system. The data is used for reactive and pro-active maintenance, auditability, controllability, and debugging purposes.
In the section about logging & tracing we already refer to the importance of logging functional system indicators. To be able to use logging for observability purposes, one additional requirement needs to be met: The logging should contain contextual information.Every log entry should refer to a request or an autonomous work package, e.g., customer-, contract-, or order-id. Log entries without this contextual information only clutter the log store and don’t have additional value. Although contextual data is critical to apply effective observability, remember that GDPR rules need to be adhered to, so not all desired references may be allowed.The log entries should be collected and stored in chronological order and have the correct severity level. Keep in mind that observability serves multiple purposes and thus needs different severities to serve those purposes. E.g., debug level log entries are helpful for anomaly analysis but are not required to get insight into the overall health of a service or component.
A metric is a specific log entry that gives information about a predefined activity or a technical process. There are metrics for functional and non-functional items, which serve different purposes and goals. Common metrics which are collected by telemetry are following the RED method [Yocum 2021]:
Tracing is the practice of tracking (following) a request or autonomous work package throughout the IT system. Following a request through multiple components of the IT system requires a trace ID to which all information can be bound. Tracing is used for different goals:
Warning: Generating high volumes of trace data (especially useless trace data) can have severe negative effects on the IT system, e.g.: performance issues.
Combining all the information from metrics, logs, and traces into a single dashboard delivers valuable information to the complete cross-functional DevOps team. Different views can be configured to fit the needs of a team member.Alerting systems notify appropriate team members if thresholds are exceeded or too many anomalies are detected. (See Reporting & Alerting).Why should we care about observability? Well, first of all, we release to many environments, so this could mean better support in development, testing and production environments. But, of course, it also means empowering the team’s ability to understand the production situation as there are always interesting new behaviors uncovered by real users under real load, and we all should be listening for them.
Sources:[Yocum 2021] The RED method: A new strategy for monitoring microservices, Tim Yocum Euteneuer, 4 November 2021.
Building Blocks
Related wiki’sRisk PokerPlanning PokerRoot Cause Analysis (RCA) Specification and Example (SaE)Test-Driven Development (TDD)Clean Code-architectureCode MaintenancePair programmingPairingTest design techniquesCode reviewUnit Testing PrinciplesCode coverageFeature togglesMonitoring of product quality Parallel testingMutation testingPath Testing (algorithm test)