Wait For
Kubernetes clusters and similar dynamic systems may experience temporary discrepancies between the actual and intended state of resources.
For example, a deployment could momentarily appear unhealthy during a scaling operation.
If alerts are configured for config.unhealthy
events, these transient state fluctuations might lead to an overwhelming number of unnecessary notifications.
To address this issue, you can utilize the waitFor parameter. This feature allows you to define a delay before sending notifications for specific events. After an event occurs, the system rechecks its status following the specified wait period. Only if the undesired state persists does a notification trigger.
waitFor
is only applicable to notifications of health related events
notify-unhealthy-deployments.yamlapiVersion: mission-control.flanksource.com/v1
kind: Notification
metadata:
name: deployment-unhealthy-alerts
spec:
events:
- config.unhealthy
waitFor: 2m
filter: config.type == 'Kubernetes::Deployment'
to:
email: alerts@acme.com
waitFor
re-evaluates the health based on the current state in config-db.
However, in some circumstances, there may be a delay between when a change occurs and when it's refelected in config-db,
potentially resulting in false positives.
waitForEvalPeriod
forces an incremental scrape of the resource before sending a notification.
It waits for up to this period for a scrape to complete before sending a notification.
apiVersion: mission-control.flanksource.com/v1
kind: Notification
metadata:
name: deployment-unhealthy-alerts
spec:
events:
- config.unhealthy
waitFor: 5m
waitForEvalPeriod: 30s
Grouping Notifications
Multiple related notifications may be generated within a short time window. Instead of sending each alert separately, you can use notification grouping to consolidate multiple events into a single message.
Example: When a Kubernetes deployment becomes unhealthy, its replicaset and associated pods will also become unhealthy.
If you have a notification set up to alert on config.unhealthy
, you'll receive 3 different notifications at the very least for the same cause.
The groupBy
parameter allows you to define how notifications should be grouped.
Grouping can be done via
type
(type of the config)description
status_reason
labels
in the formatlabels:app
tags
in the formattag:namespace
Grouping only works with waitFor. Hence, a waitFor duration is required
apiVersion: mission-control.flanksource.com/v1
kind: Notification
metadata:
name: config-health
spec:
events:
- config.unhealthy
- config.warning
waitFor: 2m
waitForEvalPeriod: 30s
groupBy:
- label:app
to:
connection: connection://default/slack