| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves kernel command line parsing in a number of ways:
a) An kernel option "foo_bar=xyz" is now considered equivalent to
"foo-bar-xyz", i.e. when comparing kernel command line option names "-" and
"_" are now considered equivalent (this only applies to the option names
though, not the option values!). Most of our kernel options used "-" as word
separator in kernel command line options so far, but some used "_". With
this change, which was a source of confusion for users (well, at least of
one user: myself, I just couldn't remember that it's elogind.debug-shell,
not elogind.debug_shell). Considering both as equivalent is inspired how
modern kernel module loading normalizes all kernel module names to use
underscores now too.
b) All options previously using a dash for separating words in kernel command
line options now use an underscore instead, in all documentation and in
code. Since a) has been implemented this should not create any compatibility
problems, but normalizes our documentation and our code.
c) All kernel command line options which take booleans (or are boolean-like)
have been reworked so that "foobar" (without argument) is now equivalent to
"foobar=1" (but not "foobar=0"), thus normalizing the handling of our
boolean arguments. Specifically this means elogind.debug-shell and
elogind_debug_shell=1 are now entirely equivalent.
d) All kernel command line options which take an argument, and where no
argument is specified will now result in a log message. e.g. passing just
"elogind.unit" will no result in a complain that it needs an argument. This
is implemented in the proc_cmdline_missing_value() function.
e) There's now a call proc_cmdline_get_bool() similar to proc_cmdline_get_key()
that parses booleans (following the logic explained in c).
f) The proc_cmdline_parse() call's boolean argument has been replaced by a new
flags argument that takes a common set of bits with proc_cmdline_get_key().
g) All kernel command line APIs now begin with the same "proc_cmdline_" prefix.
h) There are now tests for much of this. Yay!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v232's cgroup hybrid mode mounted v2 on /sys/fs/cgroup/elogind, which
unfortunately broke other tools which expect v1 there. From v233 on, hybrid
mode instead mounts and uses v2 on /sys/fs/cgroup/unified and keeps
/sys/fs/cgroup/elogind on v1 for compatibility with external tools. However,
to keep elogind live upgrades working, v233+ should be able to recognize v232
layout and keep using it.
This patch adds v232 hybrid mode support. If v232 layout is detected,
cg_unified(SYSTEMD_CGRouP_CONTROLLER) keeps returning %true but
cg_hybrid_unified() returns %false. This keeps process management on cgroup v2
but turns off the parallel layout.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SYSTEMD_CGROUP_CONTROLLER is currently defined as "name=elogind" which cgroup
utility functions interpret as a named cgroup hierarchy with the specified
named. With the planned cgroup hybrid mode changes, SYSTEMD_CGROUP_CONTROLLER
would map to different hierarchy names.
This patch makes SYSTEMD_CGROUP_CONTROLLER a special string "_elogind" which is
substituted to "name=elogind" by the cgroup utility functions. This allows the
callers to address the elogind hierarchy without actually specifying the
hierarchy name allowing the cgroup utility functions to map it to whatever is
appropriate.
Note that SYSTEMD_CGROUP_CONTROLLER was already special on full unified cgroup
hierarchy even before this patch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cg_[all_]unified() test whether a specific controller or all controllers are on
the unified hierarchy. While what's being asked is a simple binary question,
the callers must assume that the functions may fail any time, which
unnecessarily complicates their usages. This complication is unnecessary.
Internally, the test result is cached anyway and there are only a few places
where the test actually needs to be performed.
This patch simplifies cg_[all_]unified().
* cg_[all_]unified() are updated to return bool. If the result can't be
decided, assertion failure is triggered. Error handlings from their callers
are dropped.
* cg_unified_flush() is updated to calculate the new result synchrnously and
return whether it succeeded or not. Places which need to flush the test
result are updated to test for failure. This ensures that all the following
cg_[all_]unified() tests succeed.
* Places which expected possible cg_[all_]unified() failures are updated to
call and test cg_unified_flush() before calling cg_[all_]unified(). This
includes functions used while setting up mounts during boot and
manager_setup_cgroup().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
hierarchy
Currently the hybrid mode mounts cgroup v2 on /sys/fs/cgroup instead of the v1
name=elogind hierarchy. While this works fine for elogind itself, it breaks
tools which expect cgroup v1 hierarchy on /sys/fs/cgroup/elogind.
This patch updates the hybrid mode so that it mounts v2 hierarchy on
/sys/fs/cgroup/unified and keeps v1 "name=elogind" hierarchy on
/sys/fs/cgroup/elogind for compatibility. elogind itself doesn't depend on the
"name=elogind" hierarchy at all. All operations take place on the v2 hierarchy
as before but the v1 hierarchy is kept in sync so that any tools which expect
it to be there can keep doing so. This allows elogind to take advantage of
cgroup v2 process management without requiring other tools to be aware of the
hybrid mode.
The hybrid mode is implemented by mapping the special elogind controller to
/sys/fs/cgroup/unified and making the basic cgroup utility operations -
cg_attach(), cg_create(), cg_rmdir() and cg_trim() - also operate on the
/sys/fs/cgroup/elogind hierarchy whenever the cgroup2 hierarchy is updated.
While a bit messy, this will allow dropping complications from using cgroup v1
for process management a lot sooner than otherwise possible which should make
it a net gain in terms of maintainability.
v2: Fixed !cgns breakage reported by @evverx and renamed the unified mount
point to /sys/fs/cgroup/unified as suggested by @brauner.
v3: chown the compat hierarchy too on delegation. Suggested by @evverx.
v4: [zj]
- drop the change to default, full "legacy" is still the default.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Too many things don't get along with the unified hierarchy yet:
* https://github.com/opencontainers/runc/issues/1175
* https://github.com/docker/docker/issues/28109
* https://github.com/lxc/lxc/issues/1280
So revert the default to the legacy hierarchy for now. Developers of the above
software can opt into the unified hierarchy with
"elogind.legacy_elogind_cgroup_controller=0".
|
|
|
|
|
|
| |
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes strjoin and strjoina more similar and avoids the useless final
argument.
spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/elogind -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libelogind/sd-bus -I ./src/libelogind/sd-event -I ./src/libelogind/sd-login -I ./src/libelogind/sd-netlink -I ./src/libelogind/sd-network -I ./src/libelogind/sd-hwdb -I ./src/libelogind/sd-device -I ./src/libelogind/sd-id128 -I ./src/libelogind-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c)
git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/'
This might have missed a few cases (spatch has a really hard time dealing
with _cleanup_ macros), but that's no big issue, they can always be fixed
later.
|
|
|
|
| |
in name=foo:/elogind, where cgroup and root are both /elogind.
|
|
|
|
| |
'CGROUP_UNIFIED_NONE'
|
| |
|
| |
|
| |
|
|
|
|
| |
Fixes: #4181
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.
The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.
The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.
The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.
Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.
This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().
PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.
A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.
Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
A following patch will update cgroup handling so that the elogind controller
(/sys/fs/cgroup/elogind) can use the unified hierarchy even if the kernel
resource controllers are on the legacy hierarchies. This would require
distinguishing whether all controllers are on cgroup v2 or only the elogind
controller is. In preparation, this patch renames cg_unified() to
cg_all_unified().
This patch doesn't cause any functional changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, elogind uses either the legacy hierarchies or the unified hierarchy.
When the legacy hierarchies are used, elogind uses a named legacy hierarchy
mounted on /sys/fs/cgroup/elogind without any kernel controllers for process
management. Due to the shortcomings in the legacy hierarchy, this involves a
lot of workarounds and complexities.
Because the unified hierarchy can be mounted and used in parallel to legacy
hierarchies, there's no reason for elogind to use a legacy hierarchy for
management even if the kernel resource controllers need to be mounted on legacy
hierarchies. It can simply mount the unified hierarchy under
/sys/fs/cgroup/elogind and use it without affecting other legacy hierarchies.
This disables a significant amount of fragile workaround logics and would allow
using features which depend on the unified hierarchy membership such bpf cgroup
v2 membership test. In time, this would also allow deleting the said
complexities.
This patch updates elogind so that it prefers the unified hierarchy for the
elogind cgroup controller hierarchy when legacy hierarchies are used for kernel
resource controllers.
* cg_unified(@controller) is introduced which tests whether the specific
controller in on unified hierarchy and used to choose the unified hierarchy
code path for process and service management when available. Kernel
controller specific operations remain gated by cg_all_unified().
* "elogind.legacy_elogind_cgroup_controller" kernel argument can be used to
force the use of legacy hierarchy for elogind cgroup controller.
* nspawn: By default nspawn uses the same hierarchies as the host. If
UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If
0, legacy for all.
* nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of
three options - legacy, only elogind controller on unified, and unified. The
value is passed into mount setup functions and controls cgroup configuration.
* nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount
option is moved to mount_legacy_cgroup_hierarchy() so that it can take an
appropriate action depending on the configuration of the host.
v2: - CGroupUnified enum replaces open coded integer values to indicate the
cgroup operation mode.
- Various style updates.
v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2.
v4: Restored legacy container on unified host support and fixed another bug in
detect_unified_cgroup_hierarchy().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately, due to the disagreements in the kernel development community,
CPU controller cgroup v2 support has not been merged and enabling it requires
applying two small out-of-tree kernel patches. The situation is explained in
the following documentation.
https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu
While it isn't clear what will happen with CPU controller cgroup v2 support,
there are critical features which are possible only on cgroup v2 such as
buffered write control making cgroup v2 essential for a lot of workloads. This
commit implements elogind CPU controller support on the unified hierarchy so
that users who choose to deploy CPU controller cgroup v2 support can easily
take advantage of it.
On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max"
replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config
options are added with the usual compat translation. CPU quota settings remain
unchanged and apply to both legacy and unified hierarchies.
v2: - Error in man page corrected.
- CPU config application in cgroup_context_apply() refactored.
- CPU accounting now works on unified hierarchy.
|
|
|
|
|
| |
- define CLONE_NEWCGROUP
- add fun to detect whether cgroup namespaces are supported
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This method is called from a systemd manager that is the system
instance to inform all user instances of systemd about the pending
cgroup release.
elogind on the other hand is always there just once. And the release
of cgroups is handled by the local cgroups manager, which should be
provided by the running init system.
Even if there is no cgroup management, so elogind sets itself up as
a small cgroups manager itself, there aren't any user instances that
could react on the forwarding anyway.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let's lot at LOG_NOTICE about any processes that we are going to
SIGKILL/SIGABRT because clean termination of them didn't work.
This turns the various boolean flag parameters to cg_kill(), cg_migrate() and
related calls into a single binary flags parameter, simply because the function
now gained even more parameters and the parameter listed shouldn't get too
long.
Logging for killing processes is done either when the kill signal is SIGABRT or
SIGKILL, or on explicit request if KILL_TERMINATE_AND_LOG instead of LOG_TERMINATE
is passed. This isn't used yet in this patch, but is made use of in a later
patch.
|
| |
|
|
|
|
|
|
|
| |
Earlier during the development of unified hierarchy, the populated event was
reported through by the dedicated "cgroup.populated" file; however, the
interface was updated so that it's reported through the "populated" field of
"cgroup.events" file. Update populated event handling logic accordingly.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support for net_cls.class_id through the NetClass= configuration directive
has been added in v227 in preparation for a per-unit packet filter mechanism.
However, it turns out the kernel people have decided to deprecate the net_cls
and net_prio controllers in v2. Tejun provides a comprehensive justification
for this in his commit, which has landed during the merge window for kernel
v4.5:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bd1060a1d671
As we're aiming for full support for the v2 cgroup hierarchy, we can no
longer support this feature. Userspace tool such as nftables are moving over
to setting rules that are specific to the full cgroup path of a task, which
obsoletes these controllers anyway.
This commit removes support for tweaking details in the net_cls controller,
but keeps the NetClass= directive around for legacy compatibility reasons.
|
|
|
|
|
| |
Compare errno with zero in a way that tells gcc that
(if the condition is true) errno is positive.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gcc is confused by the common idiom of
return errno ? -errno : -ESOMETHING
and thinks a positive value may be returned. Replace this condition
with errno > 0 to help gcc and avoid many spurious warnings. I filed
a gcc rfe a long time ago, but it hard to say if it will ever be
implemented [1].
Both conventions were used in the codebase, this change makes things
more consistent. This is a follow up to bcb161b0230f.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61846
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although having a two line mask like
/// UNNEEDED by elogind
#if 0
it is much more easier to read (and patch!) if those two lines were
condense into a one-line mask start like
#if 0 /// UNNEEDED by elogind
|
|
|
|
|
| |
Apply remaining fixes and the performed move of utility functions
into their own foo-util.[hc] files on libbasic.
|
|
|
|
|
|
|
|
| |
- src/basic/cgroup-util.[hc]
- src/basic/memfd-util.[hc]
- src/basic/path-util.[hc]
- src/basic/process-util.[hc]
- src/basic/smack-util.[hc]
|
| |
|
|
|
|
|
| |
Although it is nice to have it read ELOGIND instead of SYSTEMD, all
diffs just show too many irrelevant (false) positives.
|
|
|
|
|
|
|
|
| |
The patching of elogind in several steps with only partly rebasing on
a common commit with upstream, left the tree in a state, that was
unmergeable with master. By rebasing on master and manually cleaning
up all commits, this merge is now possible.
However, this process left some orphans, that are cleanup now.
|
|
|
|
|
| |
This is done for systems, which init systems are no cgroup
controllers. One example is runit on Void Linux.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Check whether printf.h is available and define/undef HAVE_PRINTF_H
accordingly.
* Added src/shared/parse-printf-format.[hc] by Emil Renner Berthing
<systemd@esmil.dk> that provides parse_printf_format() if printf.h
is unavailable
* Added src/basic/musl_missing.h by Juergen Buchmueller
<pullmoll@t-online.de> that implements glibc functions missing in
musl libc as macros.
* Extended src/basic/musl_missing.h and added
src/basic/musl_missing.c providing
- program_invocation_name
- program_invocation_short_name and
- elogind_set_program_name() to set the two where appropriate.
* Added calls to elogind_set_program_name() to all main() functions
where needed.
* A few other fixes to work nicely with musl libc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous variant was nice and sleek. But unfortunately, there are
constructs like:
#if 0
(... old code ...)
#else
(... alternative code for elogind ...)
#endif // 0
These fragments couldn't be handled by the old code, but can by the
new one.
To make this work, the precompiler macros must be set like shown above.
Apart from that, all lines like:
/// Any doxygen one-line-comments with elogind in it are removed
are removed, too. Please note the three slashes.
And finally, all commented out #include directives are removed as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- src/basic/ioprio.h - removed
- src/basic/ring.h - removed
- src/basic/capability.[hc] - cleaned
- src/basic/cgroup-util.[hc] - cleaned
- src/basic/hostname-util.[hc] - cleaned
- src/basic/path-util.[hc] - cleaned
- src/basic/socket-util.h - cleaned
- src/basic/strv.[hc] - cleaned
- src/basic/time-util.[hc] - cleaned
- src/basic/unit-name.[hc] - cleaned
- src/basic/util.[hc] - cleaned
- src/libelogind/sd-bus/bus-introspect.c - cleaned
- src/login/loginctl.c - cleaned
- src/login/logind-dbus.c - cleaned
- src/login/logind.h - cleaned
- src/shared/conf-parser.[hc] - cleaned
|
| |
|
|
|
|
|
|
|
| |
a) Add some debugging messages to track what's going on with eloginds
cgroup handling.
b) Do not create a cgroup path "/elogind" if our cgroup root is
already "/elogind".
|
|
|
|
|
|
|
|
| |
elogind has to run on any system, no matter which init system is in
control of the cgroups. So instead of hardcoding "name=foo",
configure now greps 1: in /proc/self/cgroup - which is hopefully
the right choice. (Well, to be honest, if it isn't, something is
really wrong with the running system...)
|
|
|
|
|
|
|
| |
Create a private cgroup tree associated with no controllers, and use it
to map PIDs to sessions. Since we use our own path structure, remove
internal cgroup-related helpers that interpret the cgroup path structure
to pull out users, slices, and scopes.
|
| |
|
|
|
|
|
| |
Let elogind setup cgroups support on its manager initialization and
free the cgroups subsystem when the manager is destroyed.
|
| |
|
| |
|
|
|
|
|
| |
After all a path is a path is a path and we should use path_equal() to
comapre those.
|