summaryrefslogtreecommitdiff
path: root/src/core/mount-setup.c
Commit message (Collapse)AuthorAge
* core: use the unified hierarchy for the elogind cgroup controller hierarchyTejun Heo2017-07-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, elogind uses either the legacy hierarchies or the unified hierarchy. When the legacy hierarchies are used, elogind uses a named legacy hierarchy mounted on /sys/fs/cgroup/elogind without any kernel controllers for process management. Due to the shortcomings in the legacy hierarchy, this involves a lot of workarounds and complexities. Because the unified hierarchy can be mounted and used in parallel to legacy hierarchies, there's no reason for elogind to use a legacy hierarchy for management even if the kernel resource controllers need to be mounted on legacy hierarchies. It can simply mount the unified hierarchy under /sys/fs/cgroup/elogind and use it without affecting other legacy hierarchies. This disables a significant amount of fragile workaround logics and would allow using features which depend on the unified hierarchy membership such bpf cgroup v2 membership test. In time, this would also allow deleting the said complexities. This patch updates elogind so that it prefers the unified hierarchy for the elogind cgroup controller hierarchy when legacy hierarchies are used for kernel resource controllers. * cg_unified(@controller) is introduced which tests whether the specific controller in on unified hierarchy and used to choose the unified hierarchy code path for process and service management when available. Kernel controller specific operations remain gated by cg_all_unified(). * "elogind.legacy_elogind_cgroup_controller" kernel argument can be used to force the use of legacy hierarchy for elogind cgroup controller. * nspawn: By default nspawn uses the same hierarchies as the host. If UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If 0, legacy for all. * nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of three options - legacy, only elogind controller on unified, and unified. The value is passed into mount setup functions and controls cgroup configuration. * nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount option is moved to mount_legacy_cgroup_hierarchy() so that it can take an appropriate action depending on the configuration of the host. v2: - CGroupUnified enum replaces open coded integer values to indicate the cgroup operation mode. - Various style updates. v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2. v4: Restored legacy container on unified host support and fixed another bug in detect_unified_cgroup_hierarchy().
* Prep v231: Apply missing fixes from upstream (2/6) src/coreSven Eden2017-06-16
|
* Prep v230: Apply missing upstream fixes and updates (4/8) src/core.Sven Eden2017-06-16
|
* core/mount-setup.c: also relabel /dev/shm for selinux (#3039)Harald Hoyer2017-06-16
| | | | | | | | daemons, which wish to transition state from the initramfs to the real root, might use /dev/shm for their state. As /dev is not relabeled across mount points, /dev/shm has to be relabled explicitly.
* Prep v229: Add missing fixes from upstream [2/6] src/coreSven Eden2017-05-17
|
* core: log about path_is_mount_point() errorsLennart Poettering2017-05-17
| | | | | | | We really shouldn't fail silently, but print a log message about these errors. Also make sure to attach error codes to all log messages where that makes sense. (While we are at it, add a couple of (void) casts to functions where we knowingly ignore return values.)
* mount-setup.c: fix handling of symlink Smack labelling in cgroup setupPatrick Ohly2017-05-17
| | | | | | | | | | | | | | | | The code introduced in f8c1a81c51 (= elogind 227) failed for me with: Failed to copy smack label from net_cls to /sys/fs/cgroup/net_cls: No such file or directory There is no need for a symlink in this case because source and target are identical. The symlink() call is allowed to fail when the target already exists. When that happens, copying the Smack label must be skipped. But the code also failed when there is a symlink, like "cpu -> cpu,cpuacct", because mac_smack_copy() got called with src="cpu,cpuacct" which fails to find the entry because the current directory is not inside /sys/fs/cgroup. The absolute path to the existing entry must be used instead.
* Prep v228: Condense elogind source masks (5/5)Sven Eden2017-04-26
|
* Prep v228: Condense elogind source masks (4/5)Sven Eden2017-04-26
|
* Prep v228: Add remaining updates from upstream (3/3)Sven Eden2017-04-26
| | | | | Apply remaining fixes and the performed move of utility functions into their own foo-util.[hc] files on the rest of elogind.
* [3/5] Apply missing fixes from upstreamSven Eden2017-03-29
|
* mount: propagate error codes correctlyDavid Herrmann2017-03-29
| | | | | | | | | | Make sure to propagate error codes from mount-loops correctly. Right now, we return the return-code of the first mount that did _something_. This is not what we want. Make sure we return an error if _any_ mount fails (and then make sure to return the first error to not hide proper errors due to consequential errors like -ENOTDIR). Reported by cee1 <fykcee1@gmail.com>.
* smack: bugfix the smack label of symlink when '--with-smack-run-label' is setSangjung Woo2017-03-29
| | | | | | | | | | | | | | | | | | | Even though elogind has its own smack label since '--with-smack-run-label' configuration is set, the smack label of each CGROUP root directory should have the star (i.e. *) label. This is mainly because current Linux Kernel set the label in this way. (Refer to smack_d_instantiate() in security/smack/smack_lsm.c) However, if elogind has its own smack label and arg_join_controllers is explicitly set or initialized by initialize_join_controllers() function, current elogind creates the symlink in CGROUP root directory with its own smack label as below. lrwxrwxrwx. 1 root root System 11 Dec 31 16:00 cpu -> cpu,cpuacct dr-xr-xr-x. 4 root root * 0 Dec 31 16:01 cpu,cpuacct lrwxrwxrwx. 1 root root System 11 Dec 31 16:00 cpuacct -> cpu,cpuacct This patch fixes that bug by copying the smack label from the origin.
* Add mounting of a name=elogind cgroup if no init controller is found.Sven Eden2017-03-14
| | | | | This is done for systems, which init systems are no cgroup controllers. One example is runit on Void Linux.
* Remove src/coreAndy Wingo2015-04-19
|
* remove unused includesThomas Hindoe Paaboel Andersen2015-02-23
| | | | | | This patch removes includes that are not used. The removals were found with include-what-you-use which checks if any of the symbols from a header is in use.
* mount-setup: Do not bother with /proc/bus/usbCristian Rodríguez2015-01-23
| | | | | | | Current systemd requires kernel >= 3.7 per the README file but CONFIG_USB_DEVICEFS disappeared from the kernel in upstream commit fb28d58b72aa9215b26f1d5478462af394a4d253 (kernel 3.5-rc1)
* mount-setup: /selinux, /cgroup, /dev/cgroup are sooo old, don't bother with ↵Lennart Poettering2015-01-23
| | | | them anymore
* remove unneeded libgen.h includesCristian Rodríguez2015-01-17
|
* nspawn: mount most of the cgroup tree read-only in nspawn containers except ↵Lennart Poettering2015-01-05
| | | | | | | for the container's own subtree in the name=systemd hierarchy More specifically mount all other hierarchies in their entirety and the name=systemd above the container's subtree read-only.
* treewide: another round of simplificationsMichal Schmidt2014-11-28
| | | | | Using the same scripts as in f647962d64e "treewide: yet more log_*_errno + return simplifications".
* treewide: use log_*_errno whenever %m is in the format stringMichal Schmidt2014-11-28
| | | | | | | | | | | If the format string contains %m, clearly errno must have a meaningful value, so we might as well use log_*_errno to have ERRNO= logged. Using: find . -name '*.[ch]' | xargs sed -r -i -e \ 's/log_(debug|info|notice|warning|error|emergency)\((".*%m.*")/log_\1_errno(errno, \2/' Plus some whitespace, linewrap, and indent adjustments.
* core: reindent mount/kmod tablesLennart Poettering2014-11-26
|
* mount-setup: remove mount_setup_late()Daniel Mack2014-11-14
| | | | | | Turns out we can just do kmod_setup() earlier, before we do mount_setup(), so there's no need for mount_setup_late() anymore. Instead, put kdbusfs in mount_table[].
* sd-bus: sync with kdbus upstream (ABI break)Daniel Mack2014-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kdbus has seen a larger update than expected lately, most notably with kdbusfs, a file system to expose the kdbus control files: * Each time a file system of this type is mounted, a new kdbus domain is created. * The layout inside each mount point is the same as before, except that domains are not hierarchically nested anymore. * Domains are therefore also unnamed now. * Unmounting a kdbusfs will automatically also detroy the associated domain. * Hence, the action of creating a kdbus domain is now as privileged as mounting a filesystem. * This way, we can get around creating dev nodes for everything, which is last but not least something that is not limited by 20-bit minor numbers. The kdbus specific bits in nspawn have all been dropped now, as nspawn can rely on the container OS to set up its own kdbus domain, simply by mounting a new instance. A new set of mounts has been added to mount things *after* the kernel modules have been loaded. For now, only kdbus is in this set, which is invoked with mount_setup_late().
* mac: also rename use_{smack,selinux,apparmor}() calls so that they share the ↵Lennart Poettering2014-10-23
| | | | new mac_{smack,selinux,apparmor}_xyz() convention
* mount-setup: skip relabelling when SELinux and SMACK not supportedEmil Renner Berthing2014-10-10
| | | | | | This is also the only place where FTW_ACTIONRETVAL is used, so this makes systemd compile without SELinux or SMACK support when the standard library doesn't support this extension.
* hashmap: introduce hash_ops to make struct Hashmap smallerMichal Schmidt2014-09-15
| | | | | | | | | It is redundant to store 'hash' and 'compare' function pointers in struct Hashmap separately. The functions always comprise a pair. Store a single pointer to struct hash_ops instead. systemd keeps hundreds of hashmaps, so this saves a little bit of memory.
* mount-setup: fix counting of early mounts without SMACKLennart Poettering2014-08-13
| | | | http://lists.freedesktop.org/archives/systemd-devel/2014-August/021772.html
* core: Don't require cgroups xattr supportTom Hirst2014-06-26
| | | | Failure to mount cgroups with xattr should not be fatal
* core: You can not put the cached result of use_smack fct, as we are not sure ↵Ronan Le Martret2014-06-23
| | | | | | the "/sys" is mounted. So we should mount "sys" before "/proc" https://bugs.freedesktop.org/show_bug.cgi?id=77646
* build-sys: use glibc's xattr support instead of requiring libattrKay Sievers2014-05-28
|
* core: require cgroups filesystem to be availableKay Sievers2014-05-05
| | | | | We should no longer pretend that we can run in any sensible way without the kernel supporting us with cgroups functionality.
* core: don't try to relabel mounts before we loaded the policyLennart Poettering2014-03-24
|
* core: remount /sys/fs/cgroup/ read-only after we mounted all controllersLennart Poettering2014-03-18
| | | | | | Given that glibc searches for /dev/shm by just looking for any tmpfs we should be more careful with providing tmpfs instances arbitrary code might end up writing to.
* cgroup: it's not OK to invoke alloca() in loopsLennart Poettering2014-03-18
|
* exec: introduce PrivateDevices= switch to provide services with a private /devLennart Poettering2014-01-20
| | | | | | Similar to PrivateNetwork=, PrivateTmp= introduce PrivateDevices= that sets up a private /dev with only the API pseudo-devices like /dev/null, /dev/zero, /dev/random, but not any physical devices in them.
* tree-wide usage of %m specifier instead of strerror(errno)Daniel Buch2013-11-26
| | | | | | Also for log_error() except where a specific error is specified e.g. errno ? strerror(errno) : "Some user specified message"
* Smack: Test if smack is enabled before mountingAuke Kok2013-10-09
| | | | | | | | | | | | | | | | | | | | | Since on most systems with xattr systemd will compile with Smack support enabled, we still attempt to mount various fs's with Smack-only options. Before mounting any of these Smack-related filesystems with Smack specific mount options, check if Smack is functionally active on the running kernel. If Smack is really enabled in the kernel, all these Smack mounts are now *fatal*, as they should be. We no longer mount smackfs if systemd was compiled without Smack support. This makes it easier to make smackfs mount failures a critical error when Smack is enabled. We no longer mount these filesystems with their Smack specific options inside containers. There these filesystems will be mounted with there non-mount smack options for now.
* Mount /run, /dev/shm usable to tasks when using SMACK.Auke Kok2013-10-07
| | | | | | | | | | | Once systemd itself is running in a security domain for SMACK, it will fail to start countless tasks due to missing privileges for mounted and created directory structures. For /run and shm specifically, we grant all tasks access. These 2 mounts are allowed to fail, which will happen if the system is not running a SMACK enabled kernel or security=none is passed to the kernel.
* remove /run/initramfs/root-fsck logicHarald Hoyer2013-07-17
| | | | | | | dracut uses systemd in the initramfs and does not write these files anymore. The state of the root fsck is serialized.
* Small cleanupZbigniew Jędrzejewski-Szmek2013-04-24
|
* Add set_consume which always takes ownershipZbigniew Jędrzejewski-Szmek2013-04-24
| | | | Freeing in error path is the common pattern with set_put().
* Standarize on one spelling of symlink error messageZbigniew Jędrzejewski-Szmek2013-04-24
| | | | | It's polite to print the name of the link that wasn't created, and it makes little sense to print the target.
* systemd: fall back to mounting /sys/fs/cgroup sans xattrZbigniew Jędrzejewski-Szmek2013-04-24
| | | | | | | | | xattrs on cgroup fs were added back in v3.6-rc3-3-g03b1cde. But we support kernels >= 2.6.39, and we should also support kernels compiled w/o xattr support, even if systemd is compiled with xattr support. Fall back to mounting without xattr support. Tested-by: Colin Walters <walters@verbum.org>
* systemd,nspawn: use extended attributes to store metadataZbigniew Jędrzejewski-Szmek2013-04-21
| | | | | | | | | | | | All attributes are stored as text, since root_directory is already text, and it seems easier to have all of them in text format. Attributes are written in the trusted. namespace, because the kernel currently does not allow user. attributes on cgroups. This is a PITA, and CAP_SYS_ADMIN is required to *read* the attributes. Alas. A second pipe is opened for the child to signal the parent that the cgroup hierarchy has been set up.
* util: make time formatting a bit smarterLennart Poettering2013-04-04
| | | | | | | | | | | Instead of outputting "5h 55s 50ms 3us" we'll now output "5h 55.050003s". Also, while outputting the accuracy is configurable. Basically we now try use "dot notation" for all time values > 1min. For >= 1s we use 's' as unit, otherwise for >= 1ms we use 'ms' as unit, and finally 'us'. This should give reasonably values in most cases.
* mount: mount all cgroup controllers in containers, tooLennart Poettering2013-03-22
|
* core: reuse the same /tmp, /var/tmp and inaccessible dirMichal Sekletar2013-03-15
| | | | | | | | All Execs within the service, will get mounted the same /tmp and /var/tmp directories, if service is configured with PrivateTmp=yes. Temporary directories are cleaned up by service itself in addition to systemd-tmpfiles. Directory which is mounted as inaccessible is created at runtime in /run/systemd.
* sd-booted: update sd_booted() check a bitLennart Poettering2013-03-15
| | | | | | | | | | | | | | | | | | | Previously we were testing whether /sys/fs/cgroup/systemd/ was a mount point. This might be problematic however, when the cgroup trees are bind mounted into a container from the host (which should be absolutely valid), which might create the impression that the container was running systemd, but only the host actually is. Replace this by a check for the existance of the directory /run/systemd/system/, which should work unconditionally, since /run can never be a bind mount but *must* be a tmpfs on systemd systems, which is flushed at boots. This means that data in /run always reflects information about the current boot, and only of the local container, which makes it the perfect choice for a check like this. (As side effect this is nice to Ubuntu people who now use logind with the systemd cgroup hierarchy, where the old sd_booted() check misdetects systemd, even though they still run legacy Upstart.)