Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alertmanager sending duplicate notifications after 'resolved' notification when running with multiple replicas #4008

Open
ktnvaish22 opened this issue Sep 1, 2024 · 15 comments

Comments

@ktnvaish22
Copy link

ktnvaish22 commented Sep 1, 2024

What did you do?
I have VM (Victoriametrics) Alert running with Alertmanager, with 2 replicas each. I am ingesting the metric disk_usage with value > threshold value. As soon as I receive an email from alertmanager (which is after 20 minutes in my case, see below files), I am stopping the data ingestion, which stops the alert breach as well.

  • VM_rules.yaml
groups:
- name: test_group
  interval: 5m
  concurrency: 1
  rules:
    - alert: High Disk Usage
      expr: ((avg(disk_usage[5m]) by (instance)) > 95)
      for: 15m
      labels:
        severity: critical
  • alertmanager_config.yaml
global:
  resolve_timeout: 5m
  smtp_smarthost: smtp.gmail.com:587
  smtp_auth_username: XXX
  smtp_auth_password: XXX
  smtp_from: [email protected]
route:
  group_by: [alert_id, instance]
  receiver: test-default
  group_interval: 5m
  repeat_interval: 12h
receivers:
- name: test-default
  email_configs:
  - to: [email protected]
    send_resolved: true 

What did you expect to see?
Alertmanager should send a 'fired' notification and a 'resolved' notification after I stop data ingestion because breach is stopped.

What did you see instead? Under which circumstances?
Alertmanager sends a 'fired' notification, but when the ingestion/breach is stopped, it sends a 'resolved' notification along with another 'fired' notification. And, 'resolved' email for this duplicate 'fired' email comes either instantaneously or in the next group_interval. Also, sometimes I see arbitrarily multiple pairs of duplicate 'fired-resolved' mails even after breach is stopped.
This unexpected behaviour is seen only when alertmanager is running with multiple (>=2) replicas.

Environment

  • Alertmanager version:
    v.0.27.0
@grobinson-grafana
Copy link
Contributor

Hi! 👋 There are a number of situations where this can happen. Please share debug level logs for both Alertmanager servers at the time this happened and we can help understand out what caused this.

@ktnvaish22
Copy link
Author

ktnvaish22 commented Sep 4, 2024

Hey @grobinson-grafana!!
After enabling debug logs, it took few trials to reproduce the issue. Quite inconsistent behaviour.
Just one update, I have increased Alertmanager replica count to 3, so I am attaching debug level logs for all three servers of Alertmanager.

  • Replica-1 Logs
ts=2024-09-04T09:15:18.191Z caller=delegate.go:236 level=debug component=cluster received=NotifyJoin node=01J6Y519SS43HS1TNCDFKVQK4M addr=10.2.0.86:9094
ts=2024-09-04T09:15:18.380Z caller=delegate.go:236 level=debug component=cluster received=NotifyJoin node=01J6Y519Z1Y5QRXZ1Z57J2YAQ5 addr=10.2.0.193:9094
ts=2024-09-04T09:16:03.118Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:16:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:16:18.134Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:16:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:43048\n"
ts=2024-09-04T09:16:33.927Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:16:33 [DEBUG] memberlist: Stream connection from=10.2.0.86:60708\n"
ts=2024-09-04T09:17:03.133Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:17:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:17:22.212Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][resolved]"
ts=2024-09-04T09:17:22.212Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][resolved]]"
ts=2024-09-04T09:17:33.937Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:17:33 [DEBUG] memberlist: Stream connection from=10.2.0.86:58034\n"
ts=2024-09-04T09:18:03.142Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:18:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:18:18.153Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:18:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:59884\n"
ts=2024-09-04T09:18:33.946Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:18:33 [DEBUG] memberlist: Stream connection from=10.2.0.86:54140\n"
ts=2024-09-04T09:19:03.152Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:19:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:20:03.161Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:20:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:20:18.171Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:20:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:53164\n"
ts=2024-09-04T09:21:03.169Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:21:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:22:03.178Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:22:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:22:18.190Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:22:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:56390\n"
ts=2024-09-04T09:23:03.187Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:23:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:23:33.995Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:23:33 [DEBUG] memberlist: Stream connection from=10.2.0.86:52622\n"
ts=2024-09-04T09:24:03.196Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:24:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:24:18.207Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:24:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:34030\n"
ts=2024-09-04T09:25:03.205Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:25:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:25:18.216Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:25:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:51056\n"
ts=2024-09-04T09:26:03.215Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:26:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:27:03.224Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:27:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:28:03.234Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:28:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:29:03.244Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:29:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:29:34.052Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:29:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:60830\n"
ts=2024-09-04T09:29:51.175Z caller=nflog.go:334 level=debug component=nflog msg="Running maintenance"
ts=2024-09-04T09:29:51.175Z caller=silence.go:413 level=debug component=silences msg="Running maintenance"
ts=2024-09-04T09:29:51.194Z caller=silence.go:421 level=debug component=silences msg="Maintenance done" duration=18.514606ms size=0
ts=2024-09-04T09:29:51.201Z caller=nflog.go:341 level=debug component=nflog msg="Maintenance done" duration=25.564376ms size=97
ts=2024-09-04T09:30:03.254Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:30:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:30:18.262Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:30:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:55886\n"
ts=2024-09-04T09:30:34.063Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:30:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:34072\n"
ts=2024-09-04T09:31:03.263Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:31:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:31:18.272Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:31:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:58896\n"
ts=2024-09-04T09:32:03.273Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:32:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:32:18.281Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:32:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:53114\n"
ts=2024-09-04T09:33:03.282Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:33:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:33:34.091Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:33:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:40670\n"
ts=2024-09-04T09:34:03.296Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:34:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:35:03.307Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:35:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:36:03.315Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:36:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:36:34.121Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:36:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:39128\n"
ts=2024-09-04T09:37:03.325Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:37:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:37:23.797Z caller=nflog.go:533 level=debug component=nflog msg="gossiping new entry" entry="entry:<group_key:\"{}:{alert_id=\\\"101\\\", instance=\\\"11792\\\"}\" receiver:<group_name:\"Mail\" integration:\"email\" > timestamp:<seconds:1725442643 nanos:644856474 > firing_alerts:9876396428224961045 > expires_at:<seconds:1725529043 nanos:644856474 > "
ts=2024-09-04T09:38:03.334Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:38:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:38:18.333Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:38:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:42824\n"
ts=2024-09-04T09:38:34.141Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:38:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:45056\n"
ts=2024-09-04T09:39:03.343Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:39:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:39:18.343Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:39:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:50148\n"
ts=2024-09-04T09:39:34.150Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:39:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:54326\n"
ts=2024-09-04T09:40:03.352Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:40:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:40:34.159Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:40:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:36866\n"
ts=2024-09-04T09:41:03.362Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:41:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:41:34.169Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:41:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:51874\n"
ts=2024-09-04T09:42:03.371Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:42:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:42:22.316Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:42:22.316Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T09:43:03.387Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:43:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:43:34.185Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:43:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:37012\n"
ts=2024-09-04T09:44:03.397Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:44:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:44:18.387Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:44:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:50640\n"
ts=2024-09-04T09:44:51.175Z caller=silence.go:413 level=debug component=silences msg="Running maintenance"
ts=2024-09-04T09:44:51.175Z caller=nflog.go:334 level=debug component=nflog msg="Running maintenance"
ts=2024-09-04T09:44:51.184Z caller=silence.go:421 level=debug component=silences msg="Maintenance done" duration=9.217593ms size=0
ts=2024-09-04T09:44:51.190Z caller=nflog.go:341 level=debug component=nflog msg="Maintenance done" duration=14.841717ms size=97
ts=2024-09-04T09:45:03.406Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:45:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:45:34.205Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:45:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:38530\n"
ts=2024-09-04T09:46:03.415Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:46:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:46:18.404Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:46:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:44034\n"
ts=2024-09-04T09:46:34.213Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:46:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:34160\n"
ts=2024-09-04T09:47:03.424Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:47:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:47:18.416Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:47:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:35230\n"
ts=2024-09-04T09:47:22.225Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:47:22.317Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T09:48:03.433Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:48:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:48:18.426Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:48:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:32888\n"
ts=2024-09-04T09:49:03.442Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:49:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:50:03.452Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:50:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:50:34.249Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:50:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:53100\n"
ts=2024-09-04T09:51:03.461Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:51:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:52:03.471Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:52:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:52:22.230Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:52:22.266Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:52:22.318Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T09:53:03.483Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:53:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:53:18.477Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:53:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:46986\n"
ts=2024-09-04T09:54:03.493Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:54:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:54:18.487Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:54:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:34140\n"
ts=2024-09-04T09:54:34.285Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:54:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:37150\n"
ts=2024-09-04T09:55:03.504Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:55:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:55:34.295Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:55:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:47950\n"
ts=2024-09-04T09:56:03.514Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:56:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:56:18.507Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:56:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:41028\n"
ts=2024-09-04T09:56:34.305Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:56:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:46186\n"
ts=2024-09-04T09:57:03.523Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:57:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:57:22.222Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:57:22.319Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T09:57:34.313Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:57:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:56898\n"
ts=2024-09-04T09:58:03.536Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:58:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:58:18.526Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:58:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:48068\n"
ts=2024-09-04T09:59:03.546Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:59:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:59:34.331Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:59:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:36986\n"
ts=2024-09-04T09:59:51.176Z caller=silence.go:413 level=debug component=silences msg="Running maintenance"
ts=2024-09-04T09:59:51.176Z caller=nflog.go:334 level=debug component=nflog msg="Running maintenance"
ts=2024-09-04T09:59:51.187Z caller=silence.go:421 level=debug component=silences msg="Maintenance done" duration=11.191242ms size=0
ts=2024-09-04T09:59:51.193Z caller=nflog.go:341 level=debug component=nflog msg="Maintenance done" duration=17.049962ms size=97
ts=2024-09-04T10:00:03.557Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:00:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T10:00:18.545Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:00:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:46486\n"
ts=2024-09-04T10:01:03.566Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:01:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T10:01:18.555Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:01:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:56064\n"
ts=2024-09-04T10:01:34.350Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:01:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:34366\n"
ts=2024-09-04T10:02:03.576Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:02:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T10:02:18.566Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:02:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:37756\n"
ts=2024-09-04T10:02:22.250Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T10:02:22.319Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T10:02:34.359Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:02:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:53000\n"
ts=2024-09-04T10:03:03.586Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:03:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T10:04:03.596Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:04:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T10:04:34.378Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:04:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:59244\n"
ts=2024-09-04T10:05:03.607Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:05:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T10:05:18.596Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:05:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:58196\n"
ts=2024-09-04T10:06:03.617Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:06:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T10:07:03.626Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:07:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T10:07:18.614Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:07:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:49150\n"
ts=2024-09-04T10:07:22.210Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][resolved]"
ts=2024-09-04T10:07:22.320Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][resolved]]"
ts=2024-09-04T10:07:23.596Z caller=nflog.go:533 level=debug component=nflog msg="gossiping new entry" entry="entry:<group_key:\"{}:{alert_id=\\\"101\\\", instance=\\\"11792\\\"}\" receiver:<group_name:\"Mail\" integration:\"email\" > timestamp:<seconds:1725444443 nanos:484411839 > resolved_alerts:9876396428224961045 > expires_at:<seconds:1725530843 nanos:484411839 > "
ts=2024-09-04T10:07:34.405Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:07:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:34580\n"
ts=2024-09-04T10:07:38.565Z caller=nflog.go:533 level=debug component=nflog msg="gossiping new entry" entry="entry:<group_key:\"{}:{alert_id=\\\"101\\\", instance=\\\"11792\\\"}\" receiver:<group_name:\"Mail\" integration:\"email\" > timestamp:<seconds:1725444458 nanos:406291780 > firing_alerts:9876396428224961045 > expires_at:<seconds:1725530858 nanos:406291780 > "
ts=2024-09-04T10:07:53.606Z caller=notify.go:860 level=debug component=dispatcher receiver=Mail integration=email[0] aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" alerts="[High Core Usage[14f2e3b][resolved]]" msg="Notify success" attempts=1 duration=1.285027004s
ts=2024-09-04T10:08:03.637Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:08:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T10:08:18.624Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:08:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:54376\n"
ts=2024-09-04T10:09:03.646Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:09:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T10:09:18.633Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:09:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:47128\n"
ts=2024-09-04T10:10:03.656Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:10:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T10:10:18.643Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:10:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:46832\n"
ts=2024-09-04T10:11:03.667Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:11:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T10:11:34.444Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:11:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:35170\n"
ts=2024-09-04T10:12:03.675Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:12:03 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
  • Replica-2 Logs
ts=2024-09-04T09:16:03.120Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:16:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:55960\n"
ts=2024-09-04T09:16:18.131Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:16:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:17:03.135Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:17:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:32810\n"
ts=2024-09-04T09:17:18.141Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:17:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:18:18.151Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:18:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:19:03.154Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:19:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:42752\n"
ts=2024-09-04T09:19:18.160Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:19:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:19:33.957Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:19:33 [DEBUG] memberlist: Stream connection from=10.2.0.86:33336\n"
ts=2024-09-04T09:20:18.168Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:20:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:20:33.967Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:20:33 [DEBUG] memberlist: Stream connection from=10.2.0.86:60920\n"
ts=2024-09-04T09:21:18.177Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:21:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:21:33.976Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:21:33 [DEBUG] memberlist: Stream connection from=10.2.0.86:56376\n"
ts=2024-09-04T09:22:18.187Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:22:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:22:33.985Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:22:33 [DEBUG] memberlist: Stream connection from=10.2.0.86:58796\n"
ts=2024-09-04T09:23:03.189Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:23:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:56464\n"
ts=2024-09-04T09:23:18.196Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:23:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:24:18.204Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:24:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:24:34.004Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:24:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:41802\n"
ts=2024-09-04T09:25:18.213Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:25:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:25:34.014Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:25:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:54912\n"
ts=2024-09-04T09:26:03.218Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:26:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:45260\n"
ts=2024-09-04T09:26:18.222Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:26:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:26:34.024Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:26:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:35206\n"
ts=2024-09-04T09:27:18.232Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:27:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:27:34.034Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:27:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:37494\n"
ts=2024-09-04T09:28:03.236Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:28:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:33032\n"
ts=2024-09-04T09:28:18.241Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:28:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:28:34.042Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:28:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:43504\n"
ts=2024-09-04T09:29:03.246Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:29:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:39050\n"
ts=2024-09-04T09:29:18.249Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:29:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:29:48.171Z caller=silence.go:413 level=debug component=silences msg="Running maintenance"
ts=2024-09-04T09:29:48.171Z caller=nflog.go:334 level=debug component=nflog msg="Running maintenance"
ts=2024-09-04T09:29:48.182Z caller=silence.go:421 level=debug component=silences msg="Maintenance done" duration=11.13826ms size=0
ts=2024-09-04T09:29:48.188Z caller=nflog.go:341 level=debug component=nflog msg="Maintenance done" duration=16.797994ms size=97
ts=2024-09-04T09:30:03.256Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:30:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:60968\n"
ts=2024-09-04T09:30:18.259Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:30:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:31:18.268Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:31:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:31:34.072Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:31:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:44668\n"
ts=2024-09-04T09:32:18.278Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:32:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:32:34.081Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:32:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:59956\n"
ts=2024-09-04T09:33:18.286Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:33:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:34:03.298Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:34:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:36212\n"
ts=2024-09-04T09:34:18.296Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:34:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:34:34.101Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:34:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:60626\n"
ts=2024-09-04T09:35:03.309Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:35:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:59648\n"
ts=2024-09-04T09:35:18.304Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:35:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:35:34.110Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:35:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:48748\n"
ts=2024-09-04T09:36:03.317Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:36:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:41432\n"
ts=2024-09-04T09:36:18.313Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:36:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:37:18.321Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:37:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:37:22.207Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:37:22.207Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T09:37:22.249Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:37:23.796Z caller=nflog.go:533 level=debug component=nflog msg="gossiping new entry" entry="entry:<group_key:\"{}:{alert_id=\\\"101\\\", instance=\\\"11792\\\"}\" receiver:<group_name:\"Mail\" integration:\"email\" > timestamp:<seconds:1725442643 nanos:644856474 > firing_alerts:9876396428224961045 > expires_at:<seconds:1725529043 nanos:644856474 > "
ts=2024-09-04T09:37:34.130Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:37:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:33898\n"
ts=2024-09-04T09:38:03.336Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:38:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:60684\n"
ts=2024-09-04T09:38:18.330Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:38:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:39:03.345Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:39:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:54150\n"
ts=2024-09-04T09:39:18.341Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:39:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:40:03.354Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:40:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:50900\n"
ts=2024-09-04T09:40:18.349Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:40:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:41:18.358Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:41:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:42:18.367Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:42:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:42:22.208Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T09:42:34.176Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:42:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:37686\n"
ts=2024-09-04T09:43:18.376Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:43:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:44:03.399Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:44:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:48502\n"
ts=2024-09-04T09:44:18.384Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:44:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:44:34.194Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:44:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:58822\n"
ts=2024-09-04T09:44:48.171Z caller=silence.go:413 level=debug component=silences msg="Running maintenance"
ts=2024-09-04T09:44:48.171Z caller=nflog.go:334 level=debug component=nflog msg="Running maintenance"
ts=2024-09-04T09:44:48.187Z caller=silence.go:421 level=debug component=silences msg="Maintenance done" duration=16.198736ms size=0
ts=2024-09-04T09:44:48.193Z caller=nflog.go:341 level=debug component=nflog msg="Maintenance done" duration=21.641132ms size=97
ts=2024-09-04T09:45:03.408Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:45:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:43602\n"
ts=2024-09-04T09:45:18.393Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:45:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:46:18.402Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:46:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:47:03.426Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:47:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:51826\n"
ts=2024-09-04T09:47:18.412Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:47:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:47:22.208Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T09:47:22.249Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:47:34.221Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:47:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:44926\n"
ts=2024-09-04T09:48:18.423Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:48:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:48:34.230Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:48:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:59180\n"
ts=2024-09-04T09:49:03.444Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:49:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:54552\n"
ts=2024-09-04T09:49:18.434Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:49:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:49:34.239Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:49:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:41346\n"
ts=2024-09-04T09:50:03.454Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:50:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:51776\n"
ts=2024-09-04T09:50:18.445Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:50:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:51:03.463Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:51:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:42490\n"
ts=2024-09-04T09:51:18.454Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:51:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:51:34.258Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:51:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:35632\n"
ts=2024-09-04T09:52:03.473Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:52:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:44406\n"
ts=2024-09-04T09:52:18.464Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:52:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:52:22.209Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T09:52:22.262Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:52:34.268Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:52:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:56626\n"
ts=2024-09-04T09:53:03.485Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:53:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:47642\n"
ts=2024-09-04T09:53:18.474Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:53:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:53:34.277Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:53:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:57180\n"
ts=2024-09-04T09:54:03.495Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:54:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:53606\n"
ts=2024-09-04T09:54:18.484Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:54:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:55:18.493Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:55:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:56:18.504Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:56:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:57:18.513Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:57:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:57:22.210Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T09:58:03.538Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:58:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:35528\n"
ts=2024-09-04T09:58:18.523Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:58:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:58:34.322Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:58:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:46782\n"
ts=2024-09-04T09:59:18.533Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:59:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:59:48.171Z caller=silence.go:413 level=debug component=silences msg="Running maintenance"
ts=2024-09-04T09:59:48.171Z caller=nflog.go:334 level=debug component=nflog msg="Running maintenance"
ts=2024-09-04T09:59:48.180Z caller=silence.go:421 level=debug component=silences msg="Maintenance done" duration=9.034802ms size=0
ts=2024-09-04T09:59:48.186Z caller=nflog.go:341 level=debug component=nflog msg="Maintenance done" duration=15.432937ms size=97
ts=2024-09-04T10:00:03.559Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:00:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:49304\n"
ts=2024-09-04T10:00:18.542Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:00:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T10:00:34.340Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:00:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:40368\n"
ts=2024-09-04T10:01:03.569Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:01:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:51818\n"
ts=2024-09-04T10:01:18.552Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:01:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T10:02:03.579Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:02:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:38168\n"
ts=2024-09-04T10:02:18.563Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:02:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T10:02:22.211Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T10:02:22.251Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T10:03:18.572Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:03:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T10:03:34.368Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:03:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:52788\n"
ts=2024-09-04T10:04:18.583Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:04:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T10:05:03.609Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:05:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:51086\n"
ts=2024-09-04T10:05:18.594Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:05:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T10:05:34.387Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:05:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:51002\n"
ts=2024-09-04T10:06:18.603Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:06:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T10:06:34.396Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:06:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:49618\n"
ts=2024-09-04T10:07:03.629Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:07:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:35648\n"
ts=2024-09-04T10:07:18.612Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:07:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T10:07:22.212Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T10:07:22.238Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][resolved]"
ts=2024-09-04T10:07:23.596Z caller=nflog.go:533 level=debug component=nflog msg="gossiping new entry" entry="entry:<group_key:\"{}:{alert_id=\\\"101\\\", instance=\\\"11792\\\"}\" receiver:<group_name:\"Mail\" integration:\"email\" > timestamp:<seconds:1725444443 nanos:484411839 > resolved_alerts:9876396428224961045 > expires_at:<seconds:1725530843 nanos:484411839 > "
ts=2024-09-04T10:07:38.406Z caller=notify.go:860 level=debug component=dispatcher receiver=Mail integration=email[0] aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" alerts="[High Core Usage[14f2e3b][active]]" msg="Notify success" attempts=1 duration=1.192928752s
ts=2024-09-04T10:07:53.770Z caller=nflog.go:533 level=debug component=nflog msg="gossiping new entry" entry="entry:<group_key:\"{}:{alert_id=\\\"101\\\", instance=\\\"11792\\\"}\" receiver:<group_name:\"Mail\" integration:\"email\" > timestamp:<seconds:1725444473 nanos:606517691 > resolved_alerts:9876396428224961045 > expires_at:<seconds:1725530873 nanos:606517691 > "
ts=2024-09-04T10:08:03.640Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:08:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:37822\n"
ts=2024-09-04T10:08:18.622Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:08:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T10:08:34.414Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:08:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:52412\n"
ts=2024-09-04T10:09:18.631Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:09:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T10:09:34.425Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:09:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:43512\n"
ts=2024-09-04T10:10:03.659Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:10:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:49332\n"
ts=2024-09-04T10:10:18.640Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:10:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T10:10:34.435Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:10:34 [DEBUG] memberlist: Stream connection from=10.2.0.86:56932\n"
ts=2024-09-04T10:11:03.669Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:11:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:58928\n"
ts=2024-09-04T10:11:18.650Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:11:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
  • Replica-3 Logs
ts=2024-09-04T09:16:33.923Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:16:33 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:17:18.144Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:17:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:39534\n"
ts=2024-09-04T09:17:22.199Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][resolved]"
ts=2024-09-04T09:17:22.199Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][resolved]]"
ts=2024-09-04T09:17:22.235Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][resolved]"
ts=2024-09-04T09:17:33.934Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:17:33 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:18:03.144Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:18:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:41788\n"
ts=2024-09-04T09:18:33.944Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:18:33 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:19:18.162Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:19:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:55222\n"
ts=2024-09-04T09:19:33.953Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:19:33 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:20:03.163Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:20:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:44664\n"
ts=2024-09-04T09:20:33.964Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:20:33 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:21:03.171Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:21:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:40994\n"
ts=2024-09-04T09:21:18.179Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:21:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:40988\n"
ts=2024-09-04T09:21:33.974Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:21:33 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:22:03.180Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:22:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:45258\n"
ts=2024-09-04T09:22:22.200Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][resolved]]"
ts=2024-09-04T09:22:33.983Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:22:33 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:23:18.198Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:23:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:38244\n"
ts=2024-09-04T09:23:33.992Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:23:33 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:24:03.198Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:24:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:40574\n"
ts=2024-09-04T09:24:34.001Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:24:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:25:03.207Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:25:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:60140\n"
ts=2024-09-04T09:25:34.011Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:25:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:26:18.224Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:26:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:43684\n"
ts=2024-09-04T09:26:34.021Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:26:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:27:03.227Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:27:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:38702\n"
ts=2024-09-04T09:27:18.234Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:27:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:37884\n"
ts=2024-09-04T09:27:34.031Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:27:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:28:18.243Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:28:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:39366\n"
ts=2024-09-04T09:28:34.040Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:28:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:29:18.251Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:29:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:36836\n"
ts=2024-09-04T09:29:34.049Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:29:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:29:48.084Z caller=silence.go:413 level=debug component=silences msg="Running maintenance"
ts=2024-09-04T09:29:48.084Z caller=nflog.go:334 level=debug component=nflog msg="Running maintenance"
ts=2024-09-04T09:29:48.100Z caller=silence.go:421 level=debug component=silences msg="Maintenance done" duration=16.078188ms size=0
ts=2024-09-04T09:29:48.106Z caller=nflog.go:341 level=debug component=nflog msg="Maintenance done" duration=22.375412ms size=97
ts=2024-09-04T09:30:34.059Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:30:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:31:03.265Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:31:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:51402\n"
ts=2024-09-04T09:31:34.069Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:31:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:32:03.275Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:32:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:36680\n"
ts=2024-09-04T09:32:34.079Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:32:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:33:03.284Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:33:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:42564\n"
ts=2024-09-04T09:33:18.289Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:33:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:52578\n"
ts=2024-09-04T09:33:34.088Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:33:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:34:18.298Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:34:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:51102\n"
ts=2024-09-04T09:34:34.098Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:34:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:35:18.306Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:35:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:41502\n"
ts=2024-09-04T09:35:34.107Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:35:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:36:18.315Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:36:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:34822\n"
ts=2024-09-04T09:36:34.117Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:36:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:37:03.327Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:37:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:37984\n"
ts=2024-09-04T09:37:18.324Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:37:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:42298\n"
ts=2024-09-04T09:37:22.272Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:37:22.272Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T09:37:23.644Z caller=notify.go:860 level=debug component=dispatcher receiver=Mail integration=email[0] aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" alerts="[High Core Usage[14f2e3b][active]]" msg="Notify success" attempts=1 duration=1.371819884s
ts=2024-09-04T09:37:34.127Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:37:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:38:34.138Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:38:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:39:34.147Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:39:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:40:18.352Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:40:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:43436\n"
ts=2024-09-04T09:40:34.157Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:40:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:41:03.363Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:41:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:44302\n"
ts=2024-09-04T09:41:18.361Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:41:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:35358\n"
ts=2024-09-04T09:41:34.166Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:41:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:42:03.373Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:42:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:41838\n"
ts=2024-09-04T09:42:18.369Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:42:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:59692\n"
ts=2024-09-04T09:42:22.241Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:42:22.280Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:42:22.280Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T09:42:34.174Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:42:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:43:03.390Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:43:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:42306\n"
ts=2024-09-04T09:43:18.378Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:43:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:47456\n"
ts=2024-09-04T09:43:34.182Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:43:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:44:34.191Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:44:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:44:48.084Z caller=silence.go:413 level=debug component=silences msg="Running maintenance"
ts=2024-09-04T09:44:48.084Z caller=nflog.go:334 level=debug component=nflog msg="Running maintenance"
ts=2024-09-04T09:44:48.095Z caller=silence.go:421 level=debug component=silences msg="Maintenance done" duration=10.820719ms size=0
ts=2024-09-04T09:44:48.102Z caller=nflog.go:341 level=debug component=nflog msg="Maintenance done" duration=17.789108ms size=97
ts=2024-09-04T09:45:18.395Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:45:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:39668\n"
ts=2024-09-04T09:45:34.202Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:45:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:46:03.417Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:46:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:46386\n"
ts=2024-09-04T09:46:34.210Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:46:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:47:22.281Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T09:47:22.293Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:47:34.219Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:47:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:48:03.435Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:48:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:48088\n"
ts=2024-09-04T09:48:34.228Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:48:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:49:18.438Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:49:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:51628\n"
ts=2024-09-04T09:49:34.237Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:49:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:50:18.447Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:50:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:59724\n"
ts=2024-09-04T09:50:34.246Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:50:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:51:18.456Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:51:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:54944\n"
ts=2024-09-04T09:51:34.256Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:51:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:52:18.466Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:52:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:57834\n"
ts=2024-09-04T09:52:22.281Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T09:52:34.265Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:52:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:53:34.274Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:53:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:54:34.283Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:54:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:55:03.506Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:55:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:48190\n"
ts=2024-09-04T09:55:18.496Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:55:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:43244\n"
ts=2024-09-04T09:55:34.293Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:55:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:56:03.516Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:56:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:45058\n"
ts=2024-09-04T09:56:34.302Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:56:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:57:03.525Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:57:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:58030\n"
ts=2024-09-04T09:57:18.517Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:57:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:55640\n"
ts=2024-09-04T09:57:22.209Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:57:22.258Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:57:22.282Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T09:57:34.311Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:57:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:58:34.320Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:58:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T09:59:03.548Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:59:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:59404\n"
ts=2024-09-04T09:59:18.535Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:59:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:43784\n"
ts=2024-09-04T09:59:34.328Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:59:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T09:59:48.084Z caller=silence.go:413 level=debug component=silences msg="Running maintenance"
ts=2024-09-04T09:59:48.084Z caller=nflog.go:334 level=debug component=nflog msg="Running maintenance"
ts=2024-09-04T09:59:48.097Z caller=silence.go:421 level=debug component=silences msg="Maintenance done" duration=12.83806ms size=0
ts=2024-09-04T09:59:48.102Z caller=nflog.go:341 level=debug component=nflog msg="Maintenance done" duration=17.977688ms size=97
ts=2024-09-04T10:00:34.338Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:00:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T10:01:34.347Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:01:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T10:02:22.227Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T10:02:22.283Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T10:02:34.356Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:02:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T10:03:03.588Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:03:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:59094\n"
ts=2024-09-04T10:03:18.575Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:03:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:45702\n"
ts=2024-09-04T10:03:34.365Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:03:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T10:04:03.599Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:04:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:45146\n"
ts=2024-09-04T10:04:18.586Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:04:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:55376\n"
ts=2024-09-04T10:04:34.375Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:04:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T10:05:34.384Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:05:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T10:06:03.619Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:06:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:38942\n"
ts=2024-09-04T10:06:18.605Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:06:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:46892\n"
ts=2024-09-04T10:06:34.394Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:06:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T10:07:22.226Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][resolved]"
ts=2024-09-04T10:07:22.284Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][resolved]]"
ts=2024-09-04T10:07:23.484Z caller=notify.go:860 level=debug component=dispatcher receiver=Mail integration=email[0] aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" alerts="[High Core Usage[14f2e3b][resolved]]" msg="Notify success" attempts=1 duration=1.199657271s
ts=2024-09-04T10:07:34.402Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:07:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T10:07:38.565Z caller=nflog.go:533 level=debug component=nflog msg="gossiping new entry" entry="entry:<group_key:\"{}:{alert_id=\\\"101\\\", instance=\\\"11792\\\"}\" receiver:<group_name:\"Mail\" integration:\"email\" > timestamp:<seconds:1725444458 nanos:406291780 > firing_alerts:9876396428224961045 > expires_at:<seconds:1725530858 nanos:406291780 > "
ts=2024-09-04T10:07:53.770Z caller=nflog.go:533 level=debug component=nflog msg="gossiping new entry" entry="entry:<group_key:\"{}:{alert_id=\\\"101\\\", instance=\\\"11792\\\"}\" receiver:<group_name:\"Mail\" integration:\"email\" > timestamp:<seconds:1725444473 nanos:606517691 > resolved_alerts:9876396428224961045 > expires_at:<seconds:1725530873 nanos:606517691 > "
ts=2024-09-04T10:08:34.411Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:08:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T10:09:03.649Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:09:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:45426\n"
ts=2024-09-04T10:09:34.421Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:09:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T10:10:34.432Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:10:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519Z1Y5QRXZ1Z57J2YAQ5 10.2.0.193:9094\n"
ts=2024-09-04T10:11:18.652Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:11:18 [DEBUG] memberlist: Stream connection from=10.2.0.193:36276\n"
ts=2024-09-04T10:11:34.442Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:11:34 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y51CWZDJ72QD2SG2F4GQ8Z 10.2.1.153:9094\n"
ts=2024-09-04T10:12:03.678Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 10:12:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:33906\n"

So, the above logs are from the moment a fresh alert was fired. 30 minutes later, this alert was resolved. BUT, at the very moment we receive the 'resolved' message, it throws another 'firing' (which should not happen because data ingestion has been stopped totally). Now, this duplicate/extra 'fired' alert gets resolved instantaneously too. Overall, there's one duplicate/extra pair of 'fired'/'resolved' alerts.

@grobinson-grafana
Copy link
Contributor

grobinson-grafana commented Sep 4, 2024

Hi! 👋

What's happening here is an unfortunate side effect of how high availability works. I don't think there is anything you can do about this either I'm afraid.

The sequence of firing, resolved, firing resolved notifications can occur in rare cases when Prometheus sends the resolved alert to Alertmanager around the same time as the next flush. When this happens, some Alertmanager replicas can see the alert as resolved while others can still see it as active. This is what happened here.

We can see that Alertmanager 3 received the resolved alert from Prometheus. At the same time, it flushed the alert, as it had been 5 minutes since the last flush (your group_interval is 5m). It just so happens that the flush was 2ms after the resolved alert was received:

ts=2024-09-04T10:07:22.226Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][resolved]"
ts=2024-09-04T10:07:22.284Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][resolved]]"

However, if we look at Alertmanager 2, we can see that it flushed the alert 26ms before it received the resolved alert from Prometheus. Alertmanager 2 still thinks the alert is firing:

ts=2024-09-04T10:07:22.212Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T10:07:22.238Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][resolved]"

What happens next is Alertmanager 3 sends the resolved notification, and gossips that it has sent a resolved notification to Alertmanagers 1 and 2:

ts=2024-09-04T10:07:23.484Z caller=notify.go:860 level=debug component=dispatcher receiver=Mail integration=email[0] aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" alerts="[High Core Usage[14f2e3b][resolved]]" msg="Notify success" attempts=1 duration=1.199657271s

Alertmanager 2 waits 15 seconds (--peer-timeout=15s) and sees that a resolved notification was sent, but in Alertmanager 2 the alert is still firing, so it sends a firing notification, and gossips that it has sent a firing notification to Alertmanagers 1 and 3:

ts=2024-09-04T10:07:23.596Z caller=nflog.go:533 level=debug component=nflog msg="gossiping new entry" entry="entry:<group_key:\"{}:{alert_id=\\\"101\\\", instance=\\\"11792\\\"}\" receiver:<group_name:\"Mail\" integration:\"email\" > timestamp:<seconds:1725444443 nanos:484411839 > resolved_alerts:9876396428224961045 > expires_at:<seconds:1725530843 nanos:484411839 > "
ts=2024-09-04T10:07:38.406Z caller=notify.go:860 level=debug component=dispatcher receiver=Mail integration=email[0] aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" alerts="[High Core Usage[14f2e3b][active]]" msg="Notify success" attempts=1 duration=1.192928752s

Alertmanager 1 waits 30 seconds (2 * --peer-timeout=15s) and sees that the last notification sent was a firing notification, but in Alertmanager 1 the alert is resolved as like Alertmanager 3 it received the resolved alert before the flush:

ts=2024-09-04T10:07:22.210Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][resolved]"
ts=2024-09-04T10:07:22.320Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][resolved]]"

Alertmanager 1 then sends the resolved notification, and gossips to Alertmanagers 2 and 3 that a resolved notification was sent:

ts=2024-09-04T10:07:38.565Z caller=nflog.go:533 level=debug component=nflog msg="gossiping new entry" entry="entry:<group_key:\"{}:{alert_id=\\\"101\\\", instance=\\\"11792\\\"}\" receiver:<group_name:\"Mail\" integration:\"email\" > timestamp:<seconds:1725444458 nanos:406291780 > firing_alerts:9876396428224961045 > expires_at:<seconds:1725530858 nanos:406291780 > "
ts=2024-09-04T10:07:53.606Z caller=notify.go:860 level=debug component=dispatcher receiver=Mail integration=email[0] aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" alerts="[High Core Usage[14f2e3b][resolved]]" msg="Notify success" attempts=1 duration=1.285027004s

The next time Alertmanager 2 flushes it will see that the alert is resolved. And since the last notification sent was a resolved notification, it will do nothing.

I hope this helps!

@ktnvaish22
Copy link
Author

ktnvaish22 commented Sep 5, 2024

Thanks @grobinson-grafana for helpful explanation of the root cause.
Can we overcome this issue by configuring the evaluation interval (interval) in alert rules as 5m while group_wait in alertmanager as not a multiple of 5m, say 7m or 12m? Or, at least can we reduce the chances of running into this issue by this configuration? What do you think?

@grobinson-grafana
Copy link
Contributor

It might help. You'll need to test it to find out I'm afraid. Remember though, 5m and 7m align at minutes 70, 140, etc, so there will still be overlap between evaluations and flushes.

Another option is to disable resolved notifications, as sometimes resolved notifications can create a lot of noise and even flap (as is the case here). However, this also depends on how critical resolved notifications are for your monitoring.

@ktnvaish22
Copy link
Author

Hey @grobinson-grafana!! I am trying to debug the logs where 2 pairs (fired and resolved) of duplicates were notified. I have a few doubts:

  1. Does an Alertmanager instance create a gossiping entry when it receives a new alert or after flush?
  2. Why --peer-timeout keeps on doubling as you explained above?
  3. When does the peer-timeout timer start and what is its relevance?

I just want to point out in the logs myself where exactly the race between flush and alert being received, has occurred.
Would be great if you could help in understanding gossiping in brief.
Thanks!!

@grobinson-grafana
Copy link
Contributor

  1. Does an Alertmanager instance create a gossiping entry when it receives a new alert or after flush?

Neither, after the notification is sent.

  1. Why --peer-timeout keeps on doubling as you explained above?

The peers are arranged into positions, such as 0, 1, 2 etc. Peer 0 is the first to send the notification. If Peer 1 hasn't received a gossip from Peer 0 within --peer-timeout then it sends a notification. If Peer 2 hasn't received a gossip from Peer 1 within 2 * --peer-timeout then it sends a notification, etc.

  1. When does the peer-timeout timer start and what is its relevance?

Starts when the flush happens.

@ktnvaish22
Copy link
Author

Hey @grobinson-grafana!! I was debugging the logs to point out the race condition on a fresh set of logs (attached).
Alertmanager_logs.zip

I observed there's more than just a race between flushing of alerts and a new alert being received. I have tried to capture the flow of events in each Alertmanager instances as per the new logs.

Screenshot 2024-09-11 at 4 49 04 PM

Here, at timestamp 15.37.22 AM2 did not receive any alert from upstream. And, at timestamp 15.42.22 AM3 did not receive any alert while AM1 has received alert twice. Similarly, at timestamp 15.47.22 AM2 receives 3 alerts in total (1 before flush and 2 after the flush) while AM1 and AM3 did not get 'resolved' alert at all. This will definitely cause disagreement between Alertmanager replicas and thus trigger duplicate alerts.

This is just the case just with new logs but also in older ones (the ones attached in older comment). At timestamp 9.57.22 AM3 has received two alerts while AM2 received no alert at all.

AM3 (from older logs):

ts=2024-09-04T09:57:22.209Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:57:22.258Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Core Usage[14f2e3b][active]"
ts=2024-09-04T09:57:22.282Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"

AM2 (from older logs):

ts=2024-09-04T09:57:18.513Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:57:18 [DEBUG] memberlist: Initiating push/pull sync with: 01J6Y519SS43HS1TNCDFKVQK4M 10.2.0.86:9094\n"
ts=2024-09-04T09:57:22.210Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"101\", instance=\"11792\"}" msg=flushing alerts="[High Core Usage[14f2e3b][active]]"
ts=2024-09-04T09:58:03.538Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/04 09:58:03 [DEBUG] memberlist: Stream connection from=10.2.1.153:35528\n"

Isn't this a bigger issue than just rare event of race condition occurring? Or, am I missing to comprehend something here?

@grobinson-grafana
Copy link
Contributor

Hi! 👋 Yes, missing "Received alerts" can cause issues, as alerts must be sent to all Alertmanagers in the cluster to make sure the state of each alert is consistent across all replicas. I would recommend checking the logs for VictoriaMetrics to see what happened at this time that prevented those alerts from being sent to the Alertmanager 2 and 3, and what caused the duplicates on Alertmanager 1.

@ktnvaish22
Copy link
Author

ktnvaish22 commented Sep 12, 2024

Yes, missing "Received alerts" can cause issues, as alerts must be sent to all Alertmanagers in the cluster to make sure the state of each alert is consistent across all replicas

Hi @grobinson-grafana!!

  1. Should all the Alertmanager replicas receive "resolved alert" from VMAlerts?
  2. Shouldn't this be an expectation from gossiping feature that if one replica receives an alert, it should communicate it to all other replicas so that they all can be in agreement w.r.t. the state of the alert?
  3. Can you please point me to some documentation of gossiping in HA Alertmanager and how should we expect it to work?

@grobinson-grafana
Copy link
Contributor

Hi! 👋

  1. Yes. I don't know much about VMAlerts, but this is how Prometheus works.
  2. The gossip feature doesn't replicate alerts, it just replicates what is known as the nflog, which keeps track of the most recent notification sent, including a timestamp of when it was sent and which firing and resolved alerts were included, for each alert group in Alertmanager.
  3. You can check out https://github.com/prometheus/alertmanager#high-availability and https://promlabs.com/blog/2023/08/31/high-availability-for-prometheus-and-alertmanager-an-overview.

@ktnvaish22
Copy link
Author

Hey @grobinson-grafana!! We changed VM Alert config and made sure that each replica of Alertmanager gets alert from upstream.
We tried replicating the issue and captured the logs for analysis. We came across a couple of unexpected events in the logs, which can be causing duplicates.

  1. One of the Alertmanager1(AM1) instance has flushed at 19:44:42.361, but we see another flush at 19:44:57.362 (which is 15s after the initial one).
    AM1 logs:
ts=2024-09-16T19:44:42.341Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[7ac132d][active]"
ts=2024-09-16T19:44:42.341Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[69b21ca][active]"
ts=2024-09-16T19:44:42.343Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[7ac132d][active]"
ts=2024-09-16T19:44:42.343Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[69b21ca][active]"
ts=2024-09-16T19:44:42.361Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"104\", instance=\"11792\"}" msg=flushing alerts="[High Test Usage[69b21ca][active] High Test Usage[7ac132d][active]]"
ts=2024-09-16T19:44:42.368Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[69b21ca][active]"
ts=2024-09-16T19:44:42.368Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[7ac132d][active]"
ts=2024-09-16T19:44:42.904Z caller=nflog.go:533 level=debug component=nflog msg="gossiping new entry" entry="entry:<group_key:\"{}:{alert_id=\\\"104\\\", instance=\\\"11792\\\"}\" receiver:<group_name:\"Mail\" integration:\"webex\" > timestamp:<seconds:1726515882 nanos:870502670 > firing_alerts:16133756477374130202 firing_alerts:2789514587224029741 > expires_at:<seconds:1726602282 nanos:870502670 > "
ts=2024-09-16T19:44:43.704Z caller=nflog.go:533 level=debug component=nflog msg="gossiping new entry" entry="entry:<group_key:\"{}:{alert_id=\\\"104\\\", instance=\\\"11792\\\"}\" receiver:<group_name:\"Mail\" integration:\"email\" > timestamp:<seconds:1726515883 nanos:694582580 > firing_alerts:16133756477374130202 firing_alerts:2789514587224029741 > expires_at:<seconds:1726602283 nanos:694582580 > "
ts=2024-09-16T19:44:57.362Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"104\", instance=\"11792\"}" msg=flushing alerts="[High Test Usage[69b21ca][active] High Test Usage[7ac132d][active]]"
ts=2024-09-16T19:45:04.066Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/16 19:45:04 [DEBUG] memberlist: Initiating push/pull sync with: 01J7Y4HR5395YH2Q09GBW1ME14 10.2.1.48:9094\n"

Now, subsequent flushes for AM1 are at 19:49:57, 19:54:57 and so on
while flushes for AM2 and AM3 are at 19:49:42, 19:54:42 and so on.
This can cause peer-timeout of 15s to elapse before other peers flush and consequently result in duplicates.

  1. AM3 has flushed a 'resolved' at 19:59:42.364 and sent notification at 19:59:42.760 without waiting for peer-timeout even though other two AM instances are not in sync for the alert state (as per logs)

AM3 logs:

ts=2024-09-16T19:59:42.330Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[7ac132d][resolved]"
ts=2024-09-16T19:59:42.330Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[69b21ca][resolved]"
ts=2024-09-16T19:59:42.364Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[7ac132d][resolved]"
ts=2024-09-16T19:59:42.364Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[69b21ca][resolved]"
ts=2024-09-16T19:59:42.364Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"104\", instance=\"11792\"}" msg=flushing alerts="[High Test Usage[69b21ca][resolved] High Test Usage[7ac132d][resolved]]"
ts=2024-09-16T19:59:42.364Z caller=webex.go:75 level=debug integration=webex incident="{}:{alert_id=\"104\", instance=\"11792\"}"
ts=2024-09-16T19:59:42.370Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[7ac132d][resolved]"
ts=2024-09-16T19:59:42.370Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[69b21ca][resolved]"
ts=2024-09-16T19:59:42.760Z caller=notify.go:860 level=debug component=dispatcher receiver=Mail integration=webex[0] aggrGroup="{}:{alert_id=\"104\", instance=\"11792\"}" alerts="[High Test Usage[69b21ca][resolved] High Test Usage[7ac132d][resolved]]" msg="Notify success" attempts=1 duration=396.074516ms
ts=2024-09-16T19:59:45.242Z caller=notify.go:860 level=debug component=dispatcher receiver=Mail integration=email[0] aggrGroup="{}:{alert_id=\"104\", instance=\"11792\"}" alerts="[High Test Usage[69b21ca][resolved] High Test Usage[7ac132d][resolved]]" msg="Notify success" attempts=1 duration=2.878230343s
ts=2024-09-16T20:00:04.228Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/16 20:00:04 [DEBUG] memberlist: Stream connection from=10.2.0.254:59320\n"

AM2 logs:

ts=2024-09-16T19:59:42.341Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"104\", instance=\"11792\"}" msg=flushing alerts="[High Test Usage[69b21ca][active] High Test Usage[7ac132d][active]]"
ts=2024-09-16T19:59:42.341Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[7ac132d][resolved]"
ts=2024-09-16T19:59:42.341Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[69b21ca][resolved]"
ts=2024-09-16T19:59:42.416Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[7ac132d][resolved]"
ts=2024-09-16T19:59:42.416Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[69b21ca][resolved]"
ts=2024-09-16T19:59:42.417Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[7ac132d][resolved]"
ts=2024-09-16T19:59:42.417Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[69b21ca][resolved]"
ts=2024-09-16T19:59:42.904Z caller=nflog.go:533 level=debug component=nflog msg="gossiping new entry" entry="entry:<group_key:\"{}:{alert_id=\\\"104\\\", instance=\\\"11792\\\"}\" receiver:<group_name:\"Mail\" integration:\"webex\" > timestamp:<seconds:1726516782 nanos:760677958 > resolved_alerts:16133756477374130202 resolved_alerts:2789514587224029741 > expires_at:<seconds:1726603182 nanos:760677958 > "

AM1 logs:

ts=2024-09-16T19:59:42.335Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[7ac132d][resolved]"
ts=2024-09-16T19:59:42.335Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[69b21ca][resolved]"
ts=2024-09-16T19:59:42.367Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[7ac132d][resolved]"
ts=2024-09-16T19:59:42.367Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[69b21ca][resolved]"
ts=2024-09-16T19:59:42.369Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[7ac132d][resolved]"
ts=2024-09-16T19:59:42.369Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert="High Test Usage[69b21ca][resolved]"
ts=2024-09-16T19:59:42.905Z caller=nflog.go:533 level=debug component=nflog msg="gossiping new entry" entry="entry:<group_key:\"{}:{alert_id=\\\"104\\\", instance=\\\"11792\\\"}\" receiver:<group_name:\"Mail\" integration:\"webex\" > timestamp:<seconds:1726516782 nanos:760677958 > resolved_alerts:16133756477374130202 resolved_alerts:2789514587224029741 > expires_at:<seconds:1726603182 nanos:760677958 > "
ts=2024-09-16T19:59:45.267Z caller=nflog.go:533 level=debug component=nflog msg="gossiping new entry" entry="entry:<group_key:\"{}:{alert_id=\\\"104\\\", instance=\\\"11792\\\"}\" receiver:<group_name:\"Mail\" integration:\"email\" > timestamp:<seconds:1726516785 nanos:242879144 > resolved_alerts:16133756477374130202 resolved_alerts:2789514587224029741 > expires_at:<seconds:1726603185 nanos:242879144 > "
ts=2024-09-16T19:59:47.772Z caller=cluster.go:341 level=debug component=cluster memberlist="2024/09/16 19:59:47 [DEBUG] memberlist: Stream connection from=10.2.1.48:35544\n"
ts=2024-09-16T19:59:57.364Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}:{alert_id=\"104\", instance=\"11792\"}" msg=flushing alerts="[High Test Usage[69b21ca][resolved] High Test Usage[7ac132d][resolved]]"

Do you suspect something wrong with cluster causing this?

  1. When we say that AM instance waits for peer-timeout time for all other instances to come to an agreement on state of the alert, how exactly does each AM instance decide their state of the alert - is it the last alert state that was flushed or the state of the last alert received by that instance? I think it should be the flush state.

@grobinson-grafana
Copy link
Contributor

  1. I'm not sure. I thought it could be due to the issue mentioned in Remove immediate flush on reload/restart #3419, but I don't see a received alert or reload just before the flush.

  2. What position was AM3 in at that time? AM3 could have been at position 0, which means it would be the first to send a notification. You can check the position using the alertmanager_peer_position metric.

  3. To answer the last question:

    When we say that AM instance waits for peer-timeout time for all other instances to come to an agreement on state of the alert

    This isn't how it works, where did you see that? It doesn't wait for agreement on the state of the alert, it's just a failover timeout. When the timeout expires, it checks if the last notification sent is the same notification that it is about to send, and if so, skips sending a notification.

@ktnvaish22
Copy link
Author

I'm not sure. I thought it could be due to the issue mentioned in #3419, but I don't see a received alert or reload just before the flush.

Sure, I'll check the mentioned issue.

It doesn't wait for agreement on the state of the alert, it's just a failover timeout. When the timeout expires, it checks if the last notification sent is the same notification that it is about to send, and if so, skips sending a notification.

Thanks a lot for clarification @grobinson-grafana! I misunderstood the peer-timeout due to which there was a gap in my analysis of logs. Really wish we had a documentation on internals of gossiping. Anyways, Thanks again!

So, I believe, we can conclude that the root cause is a race condition where flushing, in one or more instances of the AM, occurs just milliseconds before they receive a 'resolved' alert, causing a disagreement among the peers of the cluster and consequently duplicate notifications. Also, unfortunately, we have no fix for this.

@ktnvaish22
Copy link
Author

Hey @grobinson-grafana!
We have changed how we deploy our Alertmanagers. Instead of one deployment with multiple replicas, now we are using 3 separate deployments with 1 replica each. We have also configured cluster.peer and cluster.listen-address so that these 3 separately deployed AMs form a cluster and gossip with each other.
We have tested this set-up quite extensively and haven't received any duplicate notifications. AM Logs also show that race condition did not occur as each AM has received updated alert from atleast one VMAlert before the flush.

  1. What difference do you think it brings technically that we did not hit the issue?
  2. Can this actually help us or is it just that we haven't yet hit the race?
  3. Do you see any side effects on overall functionality of HA AM if we deploy them separately?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants