-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dump: Don't unfreeze tasks on dump failure with --no-resume-on-error. #2215
base: criu-dev
Are you sure you want to change the base?
Conversation
6c2c50d
to
ae821a3
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## criu-dev #2215 +/- ##
=========================================
Coverage 70.51% 70.51%
=========================================
Files 133 133
Lines 33534 33539 +5
=========================================
+ Hits 23646 23650 +4
- Misses 9888 9889 +1 ☔ View full report in Codecov by Sentry. |
@osctobe Would it be possible to add a test for this functionality? |
There are no tests for --leave-stopped or --leave-running yet that could be extended with this case. The change is tested in production (always enabled), though. |
Here is the test for --leave-stopped:
I am sorry, but it doesn't work this way. I think our fault injection engine can be used to introduce a test. test/jenkins/criu-fault.sh contains all these tests. |
if (ret || post_dump_ret || opts.final_state == TASK_ALIVE) { | ||
if (opts.resume_on_dump_error && (ret || post_dump_ret)) | ||
opts.final_state = TASK_ALIVE; | ||
if (opts.final_state == TASK_ALIVE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to check that final_state isn't TASK_DEAD here. If finale_state is TASK_STOPPED, we have to do all these actions, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is kinda strange that we don't unlock network for --leave-stopped. One would not be able to just resume such stopped container with SIGCONT, but will also need to manually unlock network and remove link remaps and do other cleanups.
daaddc6
to
c390cc2
Compare
You keep adding |
*--no-resume-on-error*:: | ||
Leave tasks in stopped state even if checkpoint completed unsuccessfully. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some lines about what situations it is useful and why? What benefit does it provide?
How should a user decide what is preferred for them in case of criu failure?
Current doc lines are copy'n'paste from above without adding any useful context.
105f912
to
eef7254
Compare
2a95bc6
to
0e9ab38
Compare
See the freezer_restore_state() related code. If before dump you put your processes in freezer cgroup and make it FROZEN, you can later decide after dump finishes if you want to make cgroup THAWED (no dump failure) or leave it frozen (on dump failure). This does effectively the same as you want to accomplish with this new option. |
@osctobe could you response to comments? |
962791d
to
3d261c0
Compare
bf35bab
to
40f34f5
Compare
99345dc
to
3f66726
Compare
3f66726
to
038e75f
Compare
038e75f
to
284f718
Compare
A friendly reminder that this PR had no activity for 30 days. |
284f718
to
c6d4b6c
Compare
A friendly reminder that this PR had no activity for 30 days. |
Make it possible to kill or leave stopped tasks if a dump failed after stopping the tree. Signed-off-by: Michał Mirosław <[email protected]>
Signed-off-by: Michał Mirosław <[email protected]>
Make it possible to kill or leave stopped tasks if a dump failed after stopping the tree.