-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kerndat: Skip clone3(set_tid) when unprivileged. #2252
base: criu-dev
Are you sure you want to change the base?
Conversation
d4190de
to
dfcd3d5
Compare
dfcd3d5
to
e25c36b
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## criu-dev #2252 +/- ##
============================================
- Coverage 70.51% 70.50% -0.01%
============================================
Files 133 133
Lines 33534 33534
============================================
- Hits 23646 23643 -3
- Misses 9888 9891 +3 ☔ View full report in Codecov by Sentry. |
e25c36b
to
e42b099
Compare
criu/kerndat.c
Outdated
@@ -1384,12 +1384,15 @@ static bool kerndat_has_clone3_set_tid(void) | |||
pid_t pid; | |||
struct _clone_args args = {}; | |||
|
|||
kdat.has_clone3_set_tid = false; | |||
if (opts.unprivileged && !has_cap_checkpoint_restore(opts.cap_eff)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretend it's not available instead of failing restore later when running unprivileged.
Can restore succeed if kdat.has_clone3_set_tid isn't set and criu is running w/o cap_sys_admin and cap_cr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It does it using the old method: by using ns_last_pid
(via a RPC helper).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's write a comment to explain this in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this isn't a right solution for the problem:
- process can be restored in a separate user and pid namespaces and clone3() works fine in this case.
- this function checks whether clone3 is supported or not, and it is separate from the dump/restore code. I think we have to move these checks to the code that restores processes with specified pids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we should we treat EPERM here as success (clone3 with set_tid supported)? (And add a fallback path in the restoring code. [separate change])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right.
4ae575c
to
6a97988
Compare
b74e9a1
to
06819e6
Compare
06819e6
to
9bc0dbf
Compare
9bc0dbf
to
e7a4178
Compare
Hello @avagin and @osctobe, I had one idea, which I like a lot, but guys in mainstream Linux faced it with total silence. The idea was to allow clone3 syscall to alter owner user namespace of newly created namespaces (e.g. new pid namespace owner if CLONE_NEWPID is specified). https://lore.kernel.org/all/[email protected]/ This way using clone3 CRIU is able to create all restored processes in topmost user namespace available, while preserving namespace ownership topology. So at each clone3 call we would have all permissions needed by clone3_set_tid functionality. (Later we can switch to proper user namespace for each process to also preserve task's user namespaces.) I believe my fix would help in this case too. @osctobe Can you, please, give it a try on your environment? |
e7a4178
to
d8d18f7
Compare
A friendly reminder that this PR had no activity for 30 days. |
d8d18f7
to
c8082ce
Compare
clone3(set_tid) requires CAP_CHECKPOINT_RESTORE we might not have. Assume that if it errored out with EPERM it's there and might be usable from inside a user namespace. Signed-off-by: Michał Mirosław <[email protected]>
c8082ce
to
ff088a7
Compare
A set of fixes for kerndat tests and a few debug logging improvements.