Skip to main content

Alert Rules

Configure rules through a graphical interface under Alert Management → Alert Rules. Common scenarios require no PromQL (custom expressions are also supported).

caution

Thresholds and notification policies should be owned by operations staff—misconfiguration can cause false positives or missed alerts. Validate in a test environment before rolling out to production.

Alert Rules

Monitoring Types

The rules page is organized into tabs by type. Middleware types appear only after the corresponding data source is enabled:

TabWhat It MonitorsCondition
Basic ResourcesCPU, memory, disk, and other metrics of hosts / middleware; supports custom PromQLAlways
PortTCP port connectivity probingAlways
SSLHTTPS certificate validity periodAlways
MySQLConnections / slow queries and other specialized checksMySQL data source enabled
MongoDBConnections / replication lag and other specialized checksMongoDB data source enabled
Kafka BacklogConsumer group lag buildupKafka data source enabled
Kafka RebalanceFrequent consumer group rebalancingKafka data source enabled

Creating a Rule

Click New Task and fill in the fields as prompted:

FieldDescription
Task nameUnique and readable, e.g. Production host CPU too high
Data sourceThe target instance to monitor
MetricChoose from built-in metrics, or select Custom PromQL to write an expression directly (results can be previewed)
Alert thresholdOperator + threshold, e.g. > 80
Check frequencyHow often the check runs
DurationHow long the value must stay over threshold before entering the alert state, filtering out transient spikes
Notification channelBind a notification channel; if none is bound, the alert is only recorded in the platform

State Transitions

A built-in noise-reduction state machine notifies only on state changes, so a continuously firing alert does not repeatedly spam notifications:

Normal ──over threshold──▶ Warning (Pending) ──duration met──▶ Firing ──recovered──▶ Recovered
StateMeaning
Warning (Pending)Over threshold, duration not yet met
FiringOver threshold and duration met; notification already sent
RecoveredRecovered from firing to normal; recovery notification sent
PausedRule manually paused; no longer scheduled

Common Operations

  • Check now: skip the wait and run a check immediately.
  • Pause / enable: temporarily stop scheduling during planned maintenance (replaces the old "alert silence").
  • Instance details: view the current value for each instance the rule matches; supports per-instance silencing to mute individual instances while the rule as a whole keeps running.
  • Alert history: view this rule's trigger records; for the global view, see Alert History.