The objective of constant monitoring a system is to collect actionable feedback information from the monitored system. The goals of lean management are to improve the quality of products, eliminate unnecessary waste, reduce production times and reduce total costs using the feedback from the stakeholders.
What is Proactive notification?
Proactive notification, also known as proactive multi-channel communication or proactive engagement, helps raise awareness for customers and companies by providing useful information or alerting individuals to some upcoming activity or action. But for our value streams quality we need to be alerted of any defects because defects lead to time loss, quality loss, rework and customer complaints. To prevent these defects to reach clients we need to put a lot of code analyzer, reviews and Key performance indicators (KPIs).All these KPIs and metrics tools help in achieving quality but they are reactive in nature, does not stop a defect from happening. Moreover this expenditure in monitoring tools is wasteful for the value stream.
In the software development industry, many bugs discovered in quality assurance processes turn out extremely small and easy to prevent. The fact these basic bugs are discovered at the testing stage shows the developers did not perform a primary quality check of their work.There should be a proactive notification on code quality at the developer desktop so that bugs are not passed down the line. At every stage of the system development, delivery or deployment there shall be proactive notifications to prevent a defect/anomaly in the system.
How to implement Proactive notification?
In software development, developers use an integrated development environment (IDE) like Eclipse, intellij, netbeans etc. These IDEs have rudimentary code quality checks which checks for basic language specific coding conventions. They provide proactive notification to developers using simple spell check and language coding checks to correct code. These IDEs can be further enhanced by various plugins like Checkstyle, PMD, Findbugs, Sonar, Snyk, Veracode etc to perform hundreds of additional Code, OWASP, security vulnerability checks providing early warnings to developers.
Altogether at all software, hardware and network levels there can be automated tools configured to seek anomaly and report, even proactive corrective measures are taken by these tools. You should:
Use alerting rules. You should generate notifications using specific alerting rules. Alerting rules define the conditions under which an alert is generated and the notification channel for that alert. Configure alerts in logging and monitoring systems to appropriate levels.
Use thresholds. Alerting rules should use thresholds for the metrics you monitor that indicate real trouble. Monitoring thresholds trigger alerting rules, which generate notifications when metric levels cross threshold values. Configure alerts to make sure they notify people and teams who can fix the problem.
Choose thresholds carefully. Choose thresholds to only generate alerts when the threshold actually predicts an issue. That is, don't arbitrarily select a value, be aggressive or lenient in setting a threshold. For example, you might choose to trigger an alert notification when average response time for pages is within 20% of a threshold at which you know users start becoming frustrated and calling support.
Hold incident post-mortems. When you hold post-mortems following incidents, determine which indicators could have predicted the incident and monitor them in the future.
Plan a notification strategy. If a notification requires no action or the same action every time, you should automate the response. Ensuring that only relevant alerts are occurring, and that the team isn't receiving too many alerts. Proactively monitoring system health based on best practice threshold warnings on rate of change warnings.
The proactive quality improvement process starts working when everyone on the team (including the leaders) starts thinking about their work in the context of Lean values. The continuous improvement culture needs to start influencing every work process, decision, and policy that shapes the workflow and the quality control system.
How it helps us?
When tests are performed by QA the bugs are already in the code which are not identified by the developers. This wastes the time of both testers and developers, creating unnecessary iterations and stealing the attention from the more important issues.
Every time a task goes back into re-iteration(rework, bug fix) solely because of negligence – it is a waste. Some waste is unavoidable, but when you proactively seek improvement opportunities, you try to minimize the preventable waste. The proactive notification at earlier stages itself avoid most of the minor issues. Applying proactive continuous quality improvement principles here means everyone in the development teams starts thinking about the value on the level of the whole flow.
Setting notification threshold too high so that notification and failure are almost at the same time leaving no window to react.
Not setting notification on rate of change of indicator.
Setting the alert threshold too low so that there is a deluge of notifications.
A deluge of notifications during an event might be distracting rather than useful. When people are exposed to a large number of alarms, they can become desensitized to them (a problem known as "alert fatigue") leading to longer response times or missed alarms.
Ways to measure Proactive failure notifications
Instrumenting proactive monitoring is straightforward. The components to capture are:
The extent to which failure alerts from logging and monitoring systems are captured and used.
The extent to which system health is proactively monitored using threshold warnings.
The extent to which system health is proactively monitored using rate of change warnings.
To make sure you are capturing different aspects of your system, you should monitor metrics in at least two different ways. For example, you might set a metric threshold that triggers alerts if a metric rises or falls below a value over a given time window, and a rate of change, which triggers alerts when a metric value change rate is higher or lower than expected.