Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add system shutdown timestamp #3111

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions collector/systemd_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@
unitTasksCurrentDesc *prometheus.Desc
unitTasksMaxDesc *prometheus.Desc
systemRunningDesc *prometheus.Desc
systemShutdownDesc *prometheus.Desc
summaryDesc *prometheus.Desc
nRestartsDesc *prometheus.Desc
timerLastTriggerDesc *prometheus.Desc
Expand Down Expand Up @@ -112,6 +113,11 @@
"Whether the system is operational (see 'systemctl is-system-running')",
nil, nil,
)
systemShutdownDesc := prometheus.NewDesc(
prometheus.BuildFQName(namespace, subsystem, "system_shutdown_timestamp"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that should be system_shutdown_timestamp_seconds, no?

"Time for a scheduled shutdown (see 'systemctl status systemd-shutdownd.service')",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that command outputs:

Unit systemd-shutdownd.service could not be found.

here.

I found a systemd-shutdown(8) manual page, that doesn't much in understanding that component. In fact, I had a frustrating time trying to find any meaningful documentation on how that damn thing works... The org.freedesktop.login1(5) manual page does mention it though:

       ScheduledShutdown shows the value pair set with the
       ScheduleShutdown() method described above.

That's for the property we're (trying to) fetch(ing) here... That method referenced there is:

       ScheduleShutdown() schedules a shutdown operation type at time
       usec in microseconds since the UNIX epoch.  type can be one of
       "poweroff", "dry-poweroff", "reboot", "dry-reboot", "halt", and
       "dry-halt". (The "dry-" variants do not actually execute the
       shutdown action.)  CancelScheduledShutdown() cancels a scheduled
       shutdown. The output parameter cancelled is true if a shutdown
       operation was scheduled.

... which is, frankly, not that much helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure how to represent the "no shutdown scheduled" state. in my script, i used "zero seconds" as a value for that, but the property returned somehow uses something else (which looks a lot like MAX_INT-1, AKA 2^64-1, AKA 18446744073709551615 ≈ 1,844 674 407 4 × 10^19)

also note that on my laptop, this morning, after the device went to sleep on its own after a timeout, dbus says this:

anarcat@angela:~$ busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown
(st) "suspend" 0

nil, nil,
)
summaryDesc := prometheus.NewDesc(
prometheus.BuildFQName(namespace, subsystem, "units"),
"Summary of systemd unit states", []string{"state"}, nil)
Expand Down Expand Up @@ -161,6 +167,7 @@
unitTasksCurrentDesc: unitTasksCurrentDesc,
unitTasksMaxDesc: unitTasksMaxDesc,
systemRunningDesc: systemRunningDesc,
systemShutdownDesc: systemShutdownDesc,
summaryDesc: summaryDesc,
nRestartsDesc: nRestartsDesc,
timerLastTriggerDesc: timerLastTriggerDesc,
Expand Down Expand Up @@ -261,10 +268,14 @@

if systemdVersion >= minSystemdVersionSystemState {
begin = time.Now()
err = c.collectSystemState(conn, ch)

Check failure on line 271 in collector/systemd_linux.go

View workflow job for this annotation

GitHub Actions / lint

ineffectual assignment to err (ineffassign)

Check failure on line 271 in collector/systemd_linux.go

View workflow job for this annotation

GitHub Actions / lint

ineffectual assignment to err (ineffassign)
level.Debug(c.logger).Log("msg", "collectSystemState took", "duration_seconds", time.Since(begin).Seconds())
}

begin = time.Now()
err = c.collectScheduledShutdownMetrics(conn, ch)
level.Debug(c.logger).Log("msg", "collectScheduledShutdownMetrics took", "duration_seconds", time.Since(begin).Seconds())

return err
}

Expand Down Expand Up @@ -343,6 +354,23 @@
}
}

func (c *systemdCollector) collectScheduledShutdownMetrics(conn *dbus.Conn, ch chan<- prometheus.Metric) error {
var shutdownTimeUsec uint64

timestampValue, err := conn.GetServicePropertyContext(context.TODO(), "org.freedesktop.login1", "ScheduledShutdown")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just so you know, i'm not sure this returns a single integer. if it behaves like the commandline tool, it returns a tuple of 3 elements. with a pending reboot:

root@perdulce:~# busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown
(st) "reboot" 1725545703588789

without:

anarcat@angela:~$ busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown
(st) "" 18446744073709551615

notice how the timestamp is in nanoseconds, and how completely out of whack it is when there's no scheduled shutdown. not sure what's going on there.

the script i wrote in #3110 (comment) does this somewhat properly, and outputs the following metrics, with the first example:

# HELP node_shutdown_scheduled_timestamp_seconds time of the next scheduled reboot, or zero
# TYPE node_shutdown_scheduled_timestamp_seconds gauge
node_shutdown_scheduled_timestamp_seconds{kind=reboot} 1725545703.588789

with the second, it does that:

# HELP node_shutdown_scheduled_timestamp_seconds time of the next scheduled reboot, or zero
# TYPE node_shutdown_scheduled_timestamp_seconds gauge
node_shutdown_scheduled_timestamp_seconds 0

if err != nil {
level.Debug(c.logger).Log("msg", "couldn't get ScheduledShutdown", "err", err)
return errors.New("Couldn't get ScheduledShutdown property")
}
shutdownTimeUsec = timestampValue.Value.Value().(uint64)

ch <- prometheus.MustNewConstMetric(
c.systemShutdownDesc, prometheus.GaugeValue,
float64(shutdownTimeUsec)/1e6,
)
return nil
}

func (c *systemdCollector) collectUnitStartTimeMetrics(conn *dbus.Conn, ch chan<- prometheus.Metric, units []unit) {
var startTimeUsec uint64

Expand Down
Loading