-
Notifications
You must be signed in to change notification settings - Fork 8
Allow unshare(CLONE_NEWUSER|CLONE_NEWNS|CLONE_NEWUTS) syscall #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b4e8cd8 to
3988452
Compare
| }, | ||
| }, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLONE_NEWUSER does not seem covered in this PR.
Anyway, it might be still scary to allow CLONE_NEWUSER by default, due to its several vulnerabilities in the past (CVE-2023–0386, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Docker documentation lists unshare as a thing that is blocked by default, but doesn't provide an example of how an authorized user would unblock it or an argument to it for a particular workload. It just says you can pass a seccomp profile (which it seems to expect you to already be competent to create based on the default one) or you can use --security-opt seccomp=unconfined.
Given the number of people who actually know how to write a seccomp profile, and that there isn't a slightly-more-permissive one included as an available option, probably the vast majority of workloads that need user namespaces are currently running completely unconfined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLONE_NEWUSER does not seem covered in this PR.
Whoops, I somehow messed up PR and did CLONE_UTS twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, it might be still scary to allow CLONE_NEWUSER by default, due to its several vulnerabilities in the past (CVE-2023–0386, etc.)
There is a long discussion in moby/moby#42441 about safety of these. I just wanted to make a PR that (if this feature is declared safe) would implement the change.
And I said, my specific usecase is to make Buildah work. Building with Buildah inside a container will always be at least as safe (and normally much safer) than building with docker build outside of container, even if there will be a future vulnerability in unshare.
I agree with @adamnovak that writing a custom seccomp policy is too hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably the vast majority of workloads that need user namespaces are currently running completely unconfined
Or, worse yet, use custom builds of applications with sandboxing disabled (https://bugs.passt.top/show_bug.cgi?id=116#c6).
seccomp/default_linux.go
Outdated
| { | ||
| Index: 0, | ||
| Value: unix.CLONE_NEWNS, | ||
| ValueTwo: unix.CLONE_NEWNS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why need ValueTwo ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it is not needed. This is the first day I am writing seccomp policies :) Removed.
This syscall is required for multiple usecases, one of them is buildah. See moby/moby#42441 Signed-off-by: Marat Radchenko <[email protected]>
3988452 to
ea1e5e2
Compare
|
By the way, I just found this one after moving #5 to the right repository (yesterday, I filed moby/moby#51130). It comes after a passt user happily reported at https://bugs.passt.top/show_bug.cgi?id=116 that they're now finally running a custom build of passt with a ton of security features commented out, so that it runs under Docker. I'm not sure what to do with my pull request. On one hand, it comes with an explanatory diagram, and it's a bit different from this one. On the other hand, this one came first and I guess it could be amended for what passt and pasta needs. @slonopotamus what should I do? |
Me neither. My personal opinion is that it's not Docker job to guard kernel against bugs in it and it should just allow the full |
Yeah, same here. In any case, I'm trying to... try out your patch (checking passt in a Docker container), and I'm facing some issues because it now looks like building Docker now requires Docker itself (!) but I'm not really keen on installing third-party packages. Anyway, I managed to run that with some hacks, but now I only have a new |
|
Hmm, okay, I guess the BPF program (the seccomp profile) is actually loaded by I'm getting: I'm not sure why. I guess I'll try dumping the BPF program with |
|
I tried with my version and now I'm getting EACCES, instead of EPERM (which is what the BPF program would return), so there's something else. A change to the seccomp profile is needed in any case, but I would like to test this properly, so I'm going back and asking the reporter of https://bugs.passt.top/show_bug.cgi?id=116 about anything else that was needed to run passt other than not loading a seccomp profile. If you want to try this out, by the way, just install |
Practically you can just use |
I didn't know, that makes things much simpler, thanks! |
This syscall is required for multiple usecases, one of them is Buildah.
See moby/moby#42441
Also, see https://github.com/containers/buildah/blob/b74149334e3ca3d1898f4e46f4ea94db60d14eaa/chroot/run_linux.go#L152-L160
Tested by building Moby with these changes and successfully running Buildah inside Docker.
Without these changes, it fails: