Skip to main content

Wait For

Kubernetes clusters and similar dynamic systems may experience temporary discrepancies between the actual and intended state of resources. For example, a deployment could momentarily appear unhealthy during a scaling operation. If alerts are configured for config.unhealthy events, these transient state fluctuations might lead to an overwhelming number of unnecessary notifications.

To address this issue, you can utilize the waitFor parameter. This feature allows you to define a delay before sending notifications for specific events. After an event occurs, the system rechecks its status following the specified wait period. Only if the undesired state persists does a notification trigger.

info

waitFor is only applicable to notifications of health related events

notify-unhealthy-deployments.yaml
apiVersion: mission-control.flanksource.com/v1
kind: Notification
metadata:
name: deployment-unhealthy-alerts
spec:
events:
- config.unhealthy
waitFor: 2m
filter: config.type == 'Kubernetes::Deployment'
to:
email: alerts@acme.com
Handling Scrape Lag

waitFor re-evaluates the health based on the current state in config-db. However, in some circumstances, there may be a delay between when a change occurs and when it's refelected in config-db, potentially resulting in false positives.

waitForEvalPeriod forces an incremental scrape of the resource before sending a notification. It waits for up to this period for a scrape to complete before sending a notification.

apiVersion: mission-control.flanksource.com/v1
kind: Notification
metadata:
name: deployment-unhealthy-alerts
spec:
events:
- config.unhealthy
waitFor: 5m
waitForEvalPeriod: 30s

Grouping Notifications

Multiple related notifications may be generated within a short time window. Instead of sending each alert separately, you can use notification grouping to consolidate multiple events into a single message.

Example: When a Kubernetes deployment becomes unhealthy, its replicaset and associated pods will also become unhealthy. If you have a notification set up to alert on config.unhealthy, you'll receive 3 different notifications at the very least for the same cause.

The groupBy parameter allows you to define how notifications should be grouped. Grouping can be done via

  • type (type of the config)
  • description
  • status_reason
  • labels in the format labels:app
  • tags in the format tag:namespace
info

Grouping only works with waitFor. Hence, a waitFor duration is required

apiVersion: mission-control.flanksource.com/v1
kind: Notification
metadata:
name: config-health
spec:
events:
- config.unhealthy
- config.warning
waitFor: 2m
waitForEvalPeriod: 30s
groupBy:
- label:app
to:
connection: connection://default/slack