| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, elogind uses either the legacy hierarchies or the unified hierarchy.
When the legacy hierarchies are used, elogind uses a named legacy hierarchy
mounted on /sys/fs/cgroup/elogind without any kernel controllers for process
management. Due to the shortcomings in the legacy hierarchy, this involves a
lot of workarounds and complexities.
Because the unified hierarchy can be mounted and used in parallel to legacy
hierarchies, there's no reason for elogind to use a legacy hierarchy for
management even if the kernel resource controllers need to be mounted on legacy
hierarchies. It can simply mount the unified hierarchy under
/sys/fs/cgroup/elogind and use it without affecting other legacy hierarchies.
This disables a significant amount of fragile workaround logics and would allow
using features which depend on the unified hierarchy membership such bpf cgroup
v2 membership test. In time, this would also allow deleting the said
complexities.
This patch updates elogind so that it prefers the unified hierarchy for the
elogind cgroup controller hierarchy when legacy hierarchies are used for kernel
resource controllers.
* cg_unified(@controller) is introduced which tests whether the specific
controller in on unified hierarchy and used to choose the unified hierarchy
code path for process and service management when available. Kernel
controller specific operations remain gated by cg_all_unified().
* "elogind.legacy_elogind_cgroup_controller" kernel argument can be used to
force the use of legacy hierarchy for elogind cgroup controller.
* nspawn: By default nspawn uses the same hierarchies as the host. If
UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If
0, legacy for all.
* nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of
three options - legacy, only elogind controller on unified, and unified. The
value is passed into mount setup functions and controls cgroup configuration.
* nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount
option is moved to mount_legacy_cgroup_hierarchy() so that it can take an
appropriate action depending on the configuration of the host.
v2: - CGroupUnified enum replaces open coded integer values to indicate the
cgroup operation mode.
- Various style updates.
v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2.
v4: Restored legacy container on unified host support and fixed another bug in
detect_unified_cgroup_hierarchy().
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
daemons, which wish to transition state from the initramfs to the real
root, might use /dev/shm for their state.
As /dev is not relabeled across mount points, /dev/shm has to be
relabled explicitly.
|
| |
|
|
|
|
|
|
|
| |
We really shouldn't fail silently, but print a log message about these errors. Also make sure to attach error codes to
all log messages where that makes sense.
(While we are at it, add a couple of (void) casts to functions where we knowingly ignore return values.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code introduced in f8c1a81c51 (= elogind 227) failed for me with:
Failed to copy smack label from net_cls to /sys/fs/cgroup/net_cls: No such file or directory
There is no need for a symlink in this case because source and target
are identical. The symlink() call is allowed to fail when the target
already exists. When that happens, copying the Smack label must be
skipped.
But the code also failed when there is a symlink, like "cpu ->
cpu,cpuacct", because mac_smack_copy() got called with
src="cpu,cpuacct" which fails to find the entry because the current
directory is not inside /sys/fs/cgroup. The absolute path to the existing
entry must be used instead.
|
| |
|
| |
|
|
|
|
|
| |
Apply remaining fixes and the performed move of utility functions
into their own foo-util.[hc] files on the rest of elogind.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Make sure to propagate error codes from mount-loops correctly. Right now,
we return the return-code of the first mount that did _something_. This is
not what we want. Make sure we return an error if _any_ mount fails (and
then make sure to return the first error to not hide proper errors due to
consequential errors like -ENOTDIR).
Reported by cee1 <fykcee1@gmail.com>.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Even though elogind has its own smack label since
'--with-smack-run-label' configuration is set, the smack label of each
CGROUP root directory should have the star (i.e. *) label. This is
mainly because current Linux Kernel set the label in this way.
(Refer to smack_d_instantiate() in security/smack/smack_lsm.c)
However, if elogind has its own smack label and arg_join_controllers is
explicitly set or initialized by initialize_join_controllers() function,
current elogind creates the symlink in CGROUP root directory with its
own smack label as below.
lrwxrwxrwx. 1 root root System 11 Dec 31 16:00 cpu -> cpu,cpuacct
dr-xr-xr-x. 4 root root * 0 Dec 31 16:01 cpu,cpuacct
lrwxrwxrwx. 1 root root System 11 Dec 31 16:00 cpuacct -> cpu,cpuacct
This patch fixes that bug by copying the smack label from the origin.
|
|
|
|
|
| |
This is done for systems, which init systems are no cgroup
controllers. One example is runit on Void Linux.
|
| |
|
|
|
|
|
|
| |
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
|
|
|
|
|
|
|
| |
Current systemd requires kernel >= 3.7 per the README file
but CONFIG_USB_DEVICEFS disappeared from the kernel in
upstream commit fb28d58b72aa9215b26f1d5478462af394a4d253
(kernel 3.5-rc1)
|
|
|
|
| |
them anymore
|
| |
|
|
|
|
|
|
|
| |
for the container's own subtree in the name=systemd hierarchy
More specifically mount all other hierarchies in their entirety and the
name=systemd above the container's subtree read-only.
|
|
|
|
|
| |
Using the same scripts as in f647962d64e "treewide: yet more log_*_errno
+ return simplifications".
|
|
|
|
|
|
|
|
|
|
|
| |
If the format string contains %m, clearly errno must have a meaningful
value, so we might as well use log_*_errno to have ERRNO= logged.
Using:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\((".*%m.*")/log_\1_errno(errno, \2/'
Plus some whitespace, linewrap, and indent adjustments.
|
| |
|
|
|
|
|
|
| |
Turns out we can just do kmod_setup() earlier, before we do mount_setup(),
so there's no need for mount_setup_late() anymore. Instead, put kdbusfs in
mount_table[].
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
kdbus has seen a larger update than expected lately, most notably with
kdbusfs, a file system to expose the kdbus control files:
* Each time a file system of this type is mounted, a new kdbus
domain is created.
* The layout inside each mount point is the same as before, except
that domains are not hierarchically nested anymore.
* Domains are therefore also unnamed now.
* Unmounting a kdbusfs will automatically also detroy the
associated domain.
* Hence, the action of creating a kdbus domain is now as
privileged as mounting a filesystem.
* This way, we can get around creating dev nodes for everything,
which is last but not least something that is not limited by
20-bit minor numbers.
The kdbus specific bits in nspawn have all been dropped now, as nspawn
can rely on the container OS to set up its own kdbus domain, simply by
mounting a new instance.
A new set of mounts has been added to mount things *after* the kernel
modules have been loaded. For now, only kdbus is in this set, which is
invoked with mount_setup_late().
|
|
|
|
| |
new mac_{smack,selinux,apparmor}_xyz() convention
|
|
|
|
|
|
| |
This is also the only place where FTW_ACTIONRETVAL is used, so
this makes systemd compile without SELinux or SMACK support
when the standard library doesn't support this extension.
|
|
|
|
|
|
|
|
|
| |
It is redundant to store 'hash' and 'compare' function pointers in
struct Hashmap separately. The functions always comprise a pair.
Store a single pointer to struct hash_ops instead.
systemd keeps hundreds of hashmaps, so this saves a little bit of
memory.
|
|
|
|
| |
http://lists.freedesktop.org/archives/systemd-devel/2014-August/021772.html
|
|
|
|
| |
Failure to mount cgroups with xattr should not be fatal
|
|
|
|
|
|
| |
the "/sys" is mounted. So we should mount "sys" before "/proc"
https://bugs.freedesktop.org/show_bug.cgi?id=77646
|
| |
|
|
|
|
|
| |
We should no longer pretend that we can run in any sensible way
without the kernel supporting us with cgroups functionality.
|
| |
|
|
|
|
|
|
| |
Given that glibc searches for /dev/shm by just looking for any tmpfs we
should be more careful with providing tmpfs instances arbitrary code
might end up writing to.
|
| |
|
|
|
|
|
|
| |
Similar to PrivateNetwork=, PrivateTmp= introduce PrivateDevices= that
sets up a private /dev with only the API pseudo-devices like /dev/null,
/dev/zero, /dev/random, but not any physical devices in them.
|
|
|
|
|
|
| |
Also for log_error() except where a specific error is specified
e.g. errno ? strerror(errno) : "Some user specified message"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since on most systems with xattr systemd will compile with Smack
support enabled, we still attempt to mount various fs's with
Smack-only options.
Before mounting any of these Smack-related filesystems with
Smack specific mount options, check if Smack is functionally
active on the running kernel.
If Smack is really enabled in the kernel, all these Smack mounts
are now *fatal*, as they should be.
We no longer mount smackfs if systemd was compiled without
Smack support. This makes it easier to make smackfs mount
failures a critical error when Smack is enabled.
We no longer mount these filesystems with their Smack specific
options inside containers. There these filesystems will be
mounted with there non-mount smack options for now.
|
|
|
|
|
|
|
|
|
|
|
| |
Once systemd itself is running in a security domain for SMACK,
it will fail to start countless tasks due to missing privileges
for mounted and created directory structures. For /run and shm
specifically, we grant all tasks access.
These 2 mounts are allowed to fail, which will happen if the
system is not running a SMACK enabled kernel or security=none is
passed to the kernel.
|
|
|
|
|
|
|
| |
dracut uses systemd in the initramfs and does not write these files
anymore.
The state of the root fsck is serialized.
|
| |
|
|
|
|
| |
Freeing in error path is the common pattern with set_put().
|
|
|
|
|
| |
It's polite to print the name of the link that wasn't created,
and it makes little sense to print the target.
|
|
|
|
|
|
|
|
|
| |
xattrs on cgroup fs were added back in v3.6-rc3-3-g03b1cde. But we
support kernels >= 2.6.39, and we should also support kernels compiled
w/o xattr support, even if systemd is compiled with xattr support.
Fall back to mounting without xattr support.
Tested-by: Colin Walters <walters@verbum.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
All attributes are stored as text, since root_directory is already
text, and it seems easier to have all of them in text format.
Attributes are written in the trusted. namespace, because the kernel
currently does not allow user. attributes on cgroups. This is a PITA,
and CAP_SYS_ADMIN is required to *read* the attributes. Alas.
A second pipe is opened for the child to signal the parent that the
cgroup hierarchy has been set up.
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of outputting "5h 55s 50ms 3us" we'll now output "5h
55.050003s". Also, while outputting the accuracy is configurable.
Basically we now try use "dot notation" for all time values > 1min. For
>= 1s we use 's' as unit, otherwise for >= 1ms we use 'ms' as unit, and
finally 'us'.
This should give reasonably values in most cases.
|
| |
|
|
|
|
|
|
|
|
| |
All Execs within the service, will get mounted the same
/tmp and /var/tmp directories, if service is configured with
PrivateTmp=yes. Temporary directories are cleaned up by service
itself in addition to systemd-tmpfiles. Directory which is mounted
as inaccessible is created at runtime in /run/systemd.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we were testing whether /sys/fs/cgroup/systemd/ was a mount
point. This might be problematic however, when the cgroup trees are bind
mounted into a container from the host (which should be absolutely
valid), which might create the impression that the container was running
systemd, but only the host actually is.
Replace this by a check for the existance of the directory
/run/systemd/system/, which should work unconditionally, since /run can
never be a bind mount but *must* be a tmpfs on systemd systems, which is
flushed at boots. This means that data in /run always reflects
information about the current boot, and only of the local container,
which makes it the perfect choice for a check like this.
(As side effect this is nice to Ubuntu people who now use logind with
the systemd cgroup hierarchy, where the old sd_booted() check misdetects
systemd, even though they still run legacy Upstart.)
|