summaryrefslogtreecommitdiff
path: root/src/basic/cgroup-util.c
Commit message (Collapse)AuthorAge
...
* util-lib: various improvements to kernel command line parsingLennart Poettering2017-07-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This improves kernel command line parsing in a number of ways: a) An kernel option "foo_bar=xyz" is now considered equivalent to "foo-bar-xyz", i.e. when comparing kernel command line option names "-" and "_" are now considered equivalent (this only applies to the option names though, not the option values!). Most of our kernel options used "-" as word separator in kernel command line options so far, but some used "_". With this change, which was a source of confusion for users (well, at least of one user: myself, I just couldn't remember that it's elogind.debug-shell, not elogind.debug_shell). Considering both as equivalent is inspired how modern kernel module loading normalizes all kernel module names to use underscores now too. b) All options previously using a dash for separating words in kernel command line options now use an underscore instead, in all documentation and in code. Since a) has been implemented this should not create any compatibility problems, but normalizes our documentation and our code. c) All kernel command line options which take booleans (or are boolean-like) have been reworked so that "foobar" (without argument) is now equivalent to "foobar=1" (but not "foobar=0"), thus normalizing the handling of our boolean arguments. Specifically this means elogind.debug-shell and elogind_debug_shell=1 are now entirely equivalent. d) All kernel command line options which take an argument, and where no argument is specified will now result in a log message. e.g. passing just "elogind.unit" will no result in a complain that it needs an argument. This is implemented in the proc_cmdline_missing_value() function. e) There's now a call proc_cmdline_get_bool() similar to proc_cmdline_get_key() that parses booleans (following the logic explained in c). f) The proc_cmdline_parse() call's boolean argument has been replaced by a new flags argument that takes a common set of bits with proc_cmdline_get_key(). g) All kernel command line APIs now begin with the same "proc_cmdline_" prefix. h) There are now tests for much of this. Yay!
* core: keep supporting cgroup hybrid layout from v232 for live upgradesTejun Heo2017-07-17
| | | | | | | | | | | | | | v232's cgroup hybrid mode mounted v2 on /sys/fs/cgroup/elogind, which unfortunately broke other tools which expect v1 there. From v233 on, hybrid mode instead mounts and uses v2 on /sys/fs/cgroup/unified and keeps /sys/fs/cgroup/elogind on v1 for compatibility with external tools. However, to keep elogind live upgrades working, v233+ should be able to recognize v232 layout and keep using it. This patch adds v232 hybrid mode support. If v232 layout is detected, cg_unified(SYSTEMD_CGRouP_CONTROLLER) keeps returning %true but cg_hybrid_unified() returns %false. This keeps process management on cgroup v2 but turns off the parallel layout.
* core: make SYSTEMD_CGROUP_CONTROLLER a special stringTejun Heo2017-07-17
| | | | | | | | | | | | | | | | SYSTEMD_CGROUP_CONTROLLER is currently defined as "name=elogind" which cgroup utility functions interpret as a named cgroup hierarchy with the specified named. With the planned cgroup hybrid mode changes, SYSTEMD_CGROUP_CONTROLLER would map to different hierarchy names. This patch makes SYSTEMD_CGROUP_CONTROLLER a special string "_elogind" which is substituted to "name=elogind" by the cgroup utility functions. This allows the callers to address the elogind hierarchy without actually specifying the hierarchy name allowing the cgroup utility functions to map it to whatever is appropriate. Note that SYSTEMD_CGROUP_CONTROLLER was already special on full unified cgroup hierarchy even before this patch.
* core: simplify cg_[all_]unified()Tejun Heo2017-07-17
| | | | | | | | | | | | | | | | | | | | | | | | | cg_[all_]unified() test whether a specific controller or all controllers are on the unified hierarchy. While what's being asked is a simple binary question, the callers must assume that the functions may fail any time, which unnecessarily complicates their usages. This complication is unnecessary. Internally, the test result is cached anyway and there are only a few places where the test actually needs to be performed. This patch simplifies cg_[all_]unified(). * cg_[all_]unified() are updated to return bool. If the result can't be decided, assertion failure is triggered. Error handlings from their callers are dropped. * cg_unified_flush() is updated to calculate the new result synchrnously and return whether it succeeded or not. Places which need to flush the test result are updated to test for failure. This ensures that all the following cg_[all_]unified() tests succeed. * Places which expected possible cg_[all_]unified() failures are updated to call and test cg_unified_flush() before calling cg_[all_]unified(). This includes functions used while setting up mounts during boot and manager_setup_cgroup().
* core: make hybrid cgroup unified mode keep compat /sys/fs/cgroup/elogind ↵Tejun Heo2017-07-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hierarchy Currently the hybrid mode mounts cgroup v2 on /sys/fs/cgroup instead of the v1 name=elogind hierarchy. While this works fine for elogind itself, it breaks tools which expect cgroup v1 hierarchy on /sys/fs/cgroup/elogind. This patch updates the hybrid mode so that it mounts v2 hierarchy on /sys/fs/cgroup/unified and keeps v1 "name=elogind" hierarchy on /sys/fs/cgroup/elogind for compatibility. elogind itself doesn't depend on the "name=elogind" hierarchy at all. All operations take place on the v2 hierarchy as before but the v1 hierarchy is kept in sync so that any tools which expect it to be there can keep doing so. This allows elogind to take advantage of cgroup v2 process management without requiring other tools to be aware of the hybrid mode. The hybrid mode is implemented by mapping the special elogind controller to /sys/fs/cgroup/unified and making the basic cgroup utility operations - cg_attach(), cg_create(), cg_rmdir() and cg_trim() - also operate on the /sys/fs/cgroup/elogind hierarchy whenever the cgroup2 hierarchy is updated. While a bit messy, this will allow dropping complications from using cgroup v1 for process management a lot sooner than otherwise possible which should make it a net gain in terms of maintainability. v2: Fixed !cgns breakage reported by @evverx and renamed the unified mount point to /sys/fs/cgroup/unified as suggested by @brauner. v3: chown the compat hierarchy too on delegation. Suggested by @evverx. v4: [zj] - drop the change to default, full "legacy" is still the default.
* core: don't use the unified hierarchy for the elogind cgroup yet (#4628)Martin Pitt2017-07-17
| | | | | | | | | | | | Too many things don't get along with the unified hierarchy yet: * https://github.com/opencontainers/runc/issues/1175 * https://github.com/docker/docker/issues/28109 * https://github.com/lxc/lxc/issues/1280 So revert the default to the legacy hierarchy for now. Developers of the above software can opt into the unified hierarchy with "elogind.legacy_elogind_cgroup_controller=0".
* Rename formats-util.h to format-util.hZbigniew Jędrzejewski-Szmek2017-07-17
| | | | | | We don't have plural in the name of any other -util files and this inconsistency trips me up every time I try to type this file name from memory. "formats-util" is even hard to pronounce.
* tree-wide: drop NULL sentinel from strjoinZbigniew Jędrzejewski-Szmek2017-07-17
| | | | | | | | | | | | | This makes strjoin and strjoina more similar and avoids the useless final argument. spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/elogind -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libelogind/sd-bus -I ./src/libelogind/sd-event -I ./src/libelogind/sd-login -I ./src/libelogind/sd-netlink -I ./src/libelogind/sd-network -I ./src/libelogind/sd-hwdb -I ./src/libelogind/sd-device -I ./src/libelogind/sd-id128 -I ./src/libelogind-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c) git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/' This might have missed a few cases (spatch has a really hard time dealing with _cleanup_ macros), but that's no big issue, they can always be fixed later.
* Prep v232.2: cg_shift_path() : With other controllers, elogind might end up ↵Sven Eden2017-07-05
| | | | in name=foo:/elogind, where cgroup and root are both /elogind.
* Prep v232.2: cg_update_unified() : Statically set 'unified_cache' to ↵Sven Eden2017-07-05
| | | | 'CGROUP_UNIFIED_NONE'
* Prep v232: Do not listen to SYSTEMD_* environment variables to override things.Sven Eden2017-07-05
|
* Prep v232: Mask new functions that are unneeded by elogindSven Eden2017-07-05
|
* Prep v232: Apply missing updates from upstreamSven Eden2017-07-05
|
* nspawn: cleanup and chown the synced cgroup hierarchy (#4223)Evgeny Vereshchagin2017-07-05
| | | | Fixes: #4181
* core: add "invocation ID" concept to service managerLennart Poettering2017-07-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.
* util: use SPECIAL_ROOT_SLICE macro where appropriateLennart Poettering2017-07-05
|
* core: rename cg_unified() to cg_all_unified()Tejun Heo2017-07-05
| | | | | | | | | | | A following patch will update cgroup handling so that the elogind controller (/sys/fs/cgroup/elogind) can use the unified hierarchy even if the kernel resource controllers are on the legacy hierarchies. This would require distinguishing whether all controllers are on cgroup v2 or only the elogind controller is. In preparation, this patch renames cg_unified() to cg_all_unified(). This patch doesn't cause any functional changes.
* core: use the unified hierarchy for the elogind cgroup controller hierarchyTejun Heo2017-07-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, elogind uses either the legacy hierarchies or the unified hierarchy. When the legacy hierarchies are used, elogind uses a named legacy hierarchy mounted on /sys/fs/cgroup/elogind without any kernel controllers for process management. Due to the shortcomings in the legacy hierarchy, this involves a lot of workarounds and complexities. Because the unified hierarchy can be mounted and used in parallel to legacy hierarchies, there's no reason for elogind to use a legacy hierarchy for management even if the kernel resource controllers need to be mounted on legacy hierarchies. It can simply mount the unified hierarchy under /sys/fs/cgroup/elogind and use it without affecting other legacy hierarchies. This disables a significant amount of fragile workaround logics and would allow using features which depend on the unified hierarchy membership such bpf cgroup v2 membership test. In time, this would also allow deleting the said complexities. This patch updates elogind so that it prefers the unified hierarchy for the elogind cgroup controller hierarchy when legacy hierarchies are used for kernel resource controllers. * cg_unified(@controller) is introduced which tests whether the specific controller in on unified hierarchy and used to choose the unified hierarchy code path for process and service management when available. Kernel controller specific operations remain gated by cg_all_unified(). * "elogind.legacy_elogind_cgroup_controller" kernel argument can be used to force the use of legacy hierarchy for elogind cgroup controller. * nspawn: By default nspawn uses the same hierarchies as the host. If UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If 0, legacy for all. * nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of three options - legacy, only elogind controller on unified, and unified. The value is passed into mount setup functions and controls cgroup configuration. * nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount option is moved to mount_legacy_cgroup_hierarchy() so that it can take an appropriate action depending on the configuration of the host. v2: - CGroupUnified enum replaces open coded integer values to indicate the cgroup operation mode. - Various style updates. v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2. v4: Restored legacy container on unified host support and fixed another bug in detect_unified_cgroup_hierarchy().
* core: add cgroup CPU controller support on the unified hierarchyTejun Heo2017-07-05
| | | | | | | | | | | | | | | | | | | | | | | | | Unfortunately, due to the disagreements in the kernel development community, CPU controller cgroup v2 support has not been merged and enabling it requires applying two small out-of-tree kernel patches. The situation is explained in the following documentation. https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu While it isn't clear what will happen with CPU controller cgroup v2 support, there are critical features which are possible only on cgroup v2 such as buffered write control making cgroup v2 essential for a lot of workloads. This commit implements elogind CPU controller support on the unified hierarchy so that users who choose to deploy CPU controller cgroup v2 support can easily take advantage of it. On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max" replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config options are added with the usual compat translation. CPU quota settings remain unchanged and apply to both legacy and unified hierarchies. v2: - Error in man page corrected. - CPU config application in cgroup_context_apply() refactored. - CPU accounting now works on unified hierarchy.
* cgroup: detect cgroup namespacesChristian Brauner2017-07-05
| | | | | - define CLONE_NEWCGROUP - add fun to detect whether cgroup namespaces are supported
* Prep v231.2: login/elogind.c: Remove bus_forward_agent_released()Sven Eden2017-06-20
| | | | | | | | | | | | | | This method is called from a systemd manager that is the system instance to inform all user instances of systemd about the pending cgroup release. elogind on the other hand is always there just once. And the release of cgroups is handled by the local cgroups manager, which should be provided by the running init system. Even if there is no cgroup management, so elogind sets itself up as a small cgroups manager itself, there aren't any user instances that could react on the forwarding anyway.
* Prep v231: Apply missing fixes from upstream (1/6) src/basicSven Eden2017-06-16
|
* cgroup: suppress sending follow-up SIGCONT after sending SIGCONT/SIGKILL anywayLennart Poettering2017-06-16
|
* core: when forcibly killing/aborting left-over unit processes log about itLennart Poettering2017-06-16
| | | | | | | | | | | | | | | Let's lot at LOG_NOTICE about any processes that we are going to SIGKILL/SIGABRT because clean termination of them didn't work. This turns the various boolean flag parameters to cg_kill(), cg_migrate() and related calls into a single binary flags parameter, simply because the function now gained even more parameters and the parameter listed shouldn't get too long. Logging for killing processes is done either when the kill signal is SIGABRT or SIGKILL, or on explicit request if KILL_TERMINATE_AND_LOG instead of LOG_TERMINATE is passed. This isn't used yet in this patch, but is made use of in a later patch.
* Prep v230: Apply missing upstream fixes and updates (2/8) src/basic.Sven Eden2017-06-16
|
* core: update populated event handling in unified hierarchyTejun Heo2017-06-16
| | | | | | | Earlier during the development of unified hierarchy, the populated event was reported through by the dedicated "cgroup.populated" file; however, the interface was updated so that it's reported through the "populated" field of "cgroup.events" file. Update populated event handling logic accordingly.
* Prep v229: Add missing fixes from upstream [1/6] src/basicSven Eden2017-05-17
|
* cgroup: remove support for NetClass= directiveDaniel Mack2017-05-17
| | | | | | | | | | | | | | | | | | | Support for net_cls.class_id through the NetClass= configuration directive has been added in v227 in preparation for a per-unit packet filter mechanism. However, it turns out the kernel people have decided to deprecate the net_cls and net_prio controllers in v2. Tejun provides a comprehensive justification for this in his commit, which has landed during the merge window for kernel v4.5: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bd1060a1d671 As we're aiming for full support for the v2 cgroup hierarchy, we can no longer support this feature. Userspace tool such as nftables are moving over to setting rules that are specific to the full cgroup path of a task, which obsoletes these controllers anyway. This commit removes support for tweaking details in the net_cls controller, but keeps the NetClass= directive around for legacy compatibility reasons.
* tree-wide: check if errno is greater than zero (2)Zbigniew Jędrzejewski-Szmek2017-05-17
| | | | | Compare errno with zero in a way that tells gcc that (if the condition is true) errno is positive.
* tree-wide: check if errno is greater then zeroZbigniew Jędrzejewski-Szmek2017-05-17
| | | | | | | | | | | | | | gcc is confused by the common idiom of return errno ? -errno : -ESOMETHING and thinks a positive value may be returned. Replace this condition with errno > 0 to help gcc and avoid many spurious warnings. I filed a gcc rfe a long time ago, but it hard to say if it will ever be implemented [1]. Both conventions were used in the codebase, this change makes things more consistent. This is a follow up to bcb161b0230f. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61846
* Prep v228: Condense elogind source masks (5/5)Sven Eden2017-04-26
|
* Prep v228: Condense elogind source masks (1/5)Sven Eden2017-04-26
| | | | | | | | | | | | Although having a two line mask like /// UNNEEDED by elogind #if 0 it is much more easier to read (and patch!) if those two lines were condense into a one-line mask start like #if 0 /// UNNEEDED by elogind
* Prep v228: Add remaining updates from upstream (2/3)Sven Eden2017-04-26
| | | | | Apply remaining fixes and the performed move of utility functions into their own foo-util.[hc] files on libbasic.
* Prep v227: Clean up various *-util.[hc] filesSven Eden2017-04-09
| | | | | | | | - src/basic/cgroup-util.[hc] - src/basic/memfd-util.[hc] - src/basic/path-util.[hc] - src/basic/process-util.[hc] - src/basic/smack-util.[hc]
* [1/5] Apply missing fixes from upstreamSven Eden2017-03-29
|
* Rename ELOGIND_CGROUP_CONTROLLER back to SYSTEMD_CGROUP_CONTROLLERSven Eden2017-03-14
| | | | | Although it is nice to have it read ELOGIND instead of SYSTEMD, all diffs just show too many irrelevant (false) positives.
* Major cleanup of all leftovers after rebasing on master.Sven Eden2017-03-14
| | | | | | | | The patching of elogind in several steps with only partly rebasing on a common commit with upstream, left the tree in a state, that was unmergeable with master. By rebasing on master and manually cleaning up all commits, this merge is now possible. However, this process left some orphans, that are cleanup now.
* Add mounting of a name=elogind cgroup if no init controller is found.Sven Eden2017-03-14
| | | | | This is done for systems, which init systems are no cgroup controllers. One example is runit on Void Linux.
* Add support for building elogind against musl libcSven Eden2017-03-14
| | | | | | | | | | | | | | | | | | | * Check whether printf.h is available and define/undef HAVE_PRINTF_H accordingly. * Added src/shared/parse-printf-format.[hc] by Emil Renner Berthing <systemd@esmil.dk> that provides parse_printf_format() if printf.h is unavailable * Added src/basic/musl_missing.h by Juergen Buchmueller <pullmoll@t-online.de> that implements glibc functions missing in musl libc as macros. * Extended src/basic/musl_missing.h and added src/basic/musl_missing.c providing - program_invocation_name - program_invocation_short_name and - elogind_set_program_name() to set the two where appropriate. * Added calls to elogind_set_program_name() to all main() functions where needed. * A few other fixes to work nicely with musl libc.
* Fixed gawk script for git-tar target.Sven Eden2017-03-14
| | | | | | | | | | | | | | | | | | | | | | | The previous variant was nice and sleek. But unfortunately, there are constructs like: #if 0 (... old code ...) #else (... alternative code for elogind ...) #endif // 0 These fragments couldn't be handled by the old code, but can by the new one. To make this work, the precompiler macros must be set like shown above. Apart from that, all lines like: /// Any doxygen one-line-comments with elogind in it are removed are removed, too. Please note the three slashes. And finally, all commented out #include directives are removed as well.
* Cleaned up more unneeded functions and types in:Sven Eden2017-03-14
| | | | | | | | | | | | | | | | | | | | - src/basic/ioprio.h - removed - src/basic/ring.h - removed - src/basic/capability.[hc] - cleaned - src/basic/cgroup-util.[hc] - cleaned - src/basic/hostname-util.[hc] - cleaned - src/basic/path-util.[hc] - cleaned - src/basic/socket-util.h - cleaned - src/basic/strv.[hc] - cleaned - src/basic/time-util.[hc] - cleaned - src/basic/unit-name.[hc] - cleaned - src/basic/util.[hc] - cleaned - src/libelogind/sd-bus/bus-introspect.c - cleaned - src/login/loginctl.c - cleaned - src/login/logind-dbus.c - cleaned - src/login/logind.h - cleaned - src/shared/conf-parser.[hc] - cleaned
* cg_shift_path(): Do not shift if cgroup and root are equalSven Eden2017-03-14
|
* Add --enable-debug=elogind configure option and fix cgroup pathSven Eden2017-03-14
| | | | | | | a) Add some debugging messages to track what's going on with eloginds cgroup handling. b) Do not create a cgroup path "/elogind" if our cgroup root is already "/elogind".
* Detect running cgroup controller.Sven Eden2017-03-14
| | | | | | | | elogind has to run on any system, no matter which init system is in control of the cgroups. So instead of hardcoding "name=foo", configure now greps 1: in /proc/self/cgroup - which is hopefully the right choice. (Well, to be honest, if it isn't, something is really wrong with the running system...)
* Classify processes from sessions into cgroupsSven Eden2017-03-14
| | | | | | | Create a private cgroup tree associated with no controllers, and use it to map PIDs to sessions. Since we use our own path structure, remove internal cgroup-related helpers that interpret the cgroup path structure to pull out users, slices, and scopes.
* Remove src/basic/special.h, as all defines in there are systemd-only.Sven Eden2017-03-14
|
* [Patch 3/3] Add cgroups initialization and handlingSven Eden2017-03-14
| | | | | Let elogind setup cgroups support on its manager initialization and free the cgroups subsystem when the manager is destroyed.
* Prep v226: Mask all unneeded functionsSven Eden2017-03-14
|
* Prep v226: Apply missing fixes and changes to src/basicSven Eden2017-03-14
|
* cgroup: when comparing agent paths, use path_equal()Lennart Poettering2017-03-14
| | | | | After all a path is a path is a path and we should use path_equal() to comapre those.