Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add --pid-file option to write PID files #25474

Open
wants to merge 1 commit into
base: main-2.x
Choose a base branch
from

Conversation

gwossum
Copy link
Member

@gwossum gwossum commented Oct 17, 2024

Add --pid-file option to write PID files on startup. The PID filename is specified by the argument after --pid-file.

Example: influxd --pid-file /var/lib/influxd/influxd.pid

PID files are automatically removed when the influxd process is shutdown.

Closes: #25473

Add `--pid-file` option to write PID files on startup. The PID filename
is specified by the argument after `--pid-file`.

Example: `influxd --pid-file /var/lib/influxd/influxd.pid`

PID files are automatically removed when the influxd process is shutdown.

Closes: 25473
@gwossum gwossum added area/configuration kind/feature area/2.x OSS 2.0 related issues and PRs labels Oct 17, 2024
@gwossum gwossum self-assigned this Oct 17, 2024
@gwossum gwossum marked this pull request as ready for review October 18, 2024 14:32
o.PIDFile = pidFilename
})
defer func() {
l.ShutdownOrFail(t, ctx)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran in to a similar thing as this recently: #25398 (comment)

I think that this method call ShutdownOrFail will get called immediately? Might be good to check but this may not be the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's inside a lambda, so it will not be evaluated until the defer calls the lambda. Also note that if it got called immediately the require.FileExists() on line 184 would fail because the PID file gets removed when l.ShutdownOrFail(t, ctx) is called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devanbenz - the arguments to a deferred function are evaluated on the execution of the defer line, and those evaluated arguments are stored for the eventual call, but the deferred call itself doesn't happen until after the end of the closing method.

@devanbenz devanbenz self-requested a review October 18, 2024 17:34
Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some error message changes, and a question about handling a (currently) undetected error.


// Create directory to PIDfile if needed.
if err := os.MkdirAll(filepath.Dir(pidFile), 0777); err != nil {
return fmt.Errorf("mkdir: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add the path in the fmt.Errorf

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context of the PID filename is added by the caller:

if err := m.writePIDFile(opts.PIDFile); err != nil {
return fmt.Errorf("error writing PIDFile %q: %w", opts.PIDFile, err)
}

I figured this was acceptable elimination of code inside writePIDFile since it is a private, single-use method. For a public method I definitely would have added the filename into each message.

if writeErr := os.WriteFile(pidFile, []byte(pidStr), 0666); writeErr != nil {
// Let's make sure we don't leave a PID file behind on error.
removeErr := os.Remove(pidFile)
return fmt.Errorf("write file: %w; remove file: %w", writeErr, removeErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add the path to the error message

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we want to use something lower level than WriteFile like OpenFile so we can pass in O_EXCL and detect existing PID files that weren't cleaned up (or multiple InfluxDB instances). I think that might be an error we want to detect, report, and possibly abort on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to find using the PID file as a lock file more annoying than useful (looking at you PostgreSQL!). It also generates tons of support questions.

The less annoying way of preventing multiple instances is to check for for already listening sockets. We kind of do that today by aborting if we can't bind to the bind addresses. We do the abort after we've opened the PID file, so we would blow away the PID file on shutdown in those situations even though the other (presumed) influxd instance is running.

If we want more robust handling of the PID file if you accidentally launch multiple influxd instances with the same configuration, my suggestion would be a feature request to try connecting to the bind addresses very early on to see if anything is listening on them, or use a Unix domain socket / named pipe for the check. This way when the application dies, there is nothing listening or responding on the socket. With a PID or lock file, the file can be stay around if the application is not shut down cleanly.

Or we could simply write the PID after we bind the sockets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I don't like the idea of dropping the PID file after we're listening for connections, because that could put a long delay between influxd starts and when the PID file drops.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion then, to avoid support questions, would be to check for existence, issue a warning if present, and then continue, rather than a fatal error. That will let the customer know that the WAL flush may not have happened on shutdown, for instance.

label: "pidfile",
closer: func(context.Context) error {
if err := os.Remove(pidFile); err != nil {
return fmt.Errorf("removing PID file %q: %w", pidFile, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the way!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the caller to the cleanup function has no context, so the error must contain the filename.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants