-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvme: Add support for Autonomous Power State Transition #1444
base: main
Are you sure you want to change the base?
Conversation
Generally, I like this. I'm unsure of what we should do by default... |
@bsdimp, are you considering the UX issue I described earlier or deciding whether to enable or disable the feature by default? If it's the former, there's also an option to set both tunables to -1 by default and take no action unless they are changed. This would avoid inconsistency, though it would make the code slightly less straightforward due to handling signed values. I can implement this if needed. If it's the latter, I strongly believe it's not a good idea to enable the feature by default due to the latency trade-offs and other potential flaws in the higher power states. At the same time, disabling power-saving when it's working properly isn't ideal either, and if it doesn't work properly, that’s not a FreeBSD issue. Adding support for the mechanism without enforcing it won't harm anyone and can only benefit users who are concerned about power consumption. |
This is a great feature. But for some reason, I think that we can implement it in user space - nvmecontrol(8) |
@wigneddoom This can easily be implemented in user space (in fact, most of it has already been done, as |
Though my approach to configuring APST closely resembles the Linux implementation, I agree that its design could be improved. In particular, I totally agree with @wigneddoom on moving as much functionality as possible out of kernel, though I'm still considering the best way to achieve this. I've marked this PR as a draft, so I can take some time to give it more thought. Please remove the needs-review label for now. |
The functions nvme_ctrlr_cmd_set_feature and nvme_ctrlr_cmd_get_feature already have payload and payload_size in their parameter lists, but they were not used. However, this is neccessary for some features such as APST because they send data through a data buffer. Signed-off-by: Alexey Sukhoguzov <[email protected]>
It turned out that I had discarded most of the previous code. Pretty much all that's left are Get/Set Feature calls if At this point, I don't think it's worth adding a command to |
Signed-off-by: Alexey Sukhoguzov <[email protected]>
21e9258
to
45d3802
Compare
APST is an optional NVMe power-saving feature that allows devices to automatically enter higher non-operational power states after a certain amount of idle time, reducing the controller's overall power consumption. Setting up the feature requires sending an APST data structure (32 64-bit entries) through the data buffer. This payload could be obtained from the controller itself, thus maintaining default behaviour provided by the vendor. Signed-off-by: Alexey Sukhoguzov <[email protected]>
APST data can be manually overridden using the apst_data tunable, which should be an array of unsigned integers. This method offers more flexibility than automatic generation, but the specific values might depend on the particular controller. Signed-off-by: Alexey Sukhoguzov <[email protected]>
APST data can be generated automatically based on the apst_max_latency tunable. This method offers the same simple configuration as on Linux but allows for setting an upper latency limit only. Signed-off-by: Alexey Sukhoguzov <[email protected]>
I've further split up the commits, made some doc fixes, and slightly clarified the control flow. I also added back automatic APST data generation (i.e. In any case, everything seems to be ready for review. Apologies for the noise above, I should have kept this PR as a draft until now. |
APST is an optional NVMe power-saving feature that allows devices to automatically enter higher non-operational power states after a certain amount of idle time, reducing the controller's overall power consumption.
The feature configuration involves filling out the transition table, which then needs to be sent to the controller in a data buffer. Each table entry corresponds to one of the available power states and contains two values: idle transition power state (ITPS) and idle time prior to transition (ITPT). The first specifies the next power state the controller should switch to, and the second specifies the amount of idle time required before that switch.
Two sysctls are added:
apst_itpt_factor
for ITPT calculation (as an integer by which the total latency will be multiplied to get a suitable transition flow), andapst_max_latency
for cutting off higher states with unwanted latency (by specifying a maximum value in microseconds).The thing I struggled with the most was the default settings. I know that different controllers set them differently, and it's probably not a good idea to change them without the user's explicit intent. The best solution I could come up with is to check the return values of the
TUNABLE_INT_FETCH
calls and, depending on those, decide whether to touch the feature or not. The issue here is that if the feature is enabled by the vendor, sysctls will show zero values by default, but this would be incorrect. I'm not sure if this is a problem, but it's quite possible that a more elegant solution can be found.