| Commit message (Collapse) | Author | Age |
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using SELinux with legacy cgroups the tmpfs on /sys/fs/cgroup is by
default labelled as tmpfs_t. This label is also inherited by the "cpu"
and "cpuacct" symbolic links. Unfortunately the policy expects them to
be labelled as cgroup_t, which is used for all the actual cgroup
filesystems. Failure to do so results in a stream of denials.
This state cannot be fixed reliably when the cgroup filesystem structure
is set-up as the SELinux policy is not yet loaded at this
moment. It also cannot be fixed later as the root of the cgroup
filesystem is remounted read-only. In order to fix it the root of the
cgroup filesystem needs to be temporary remounted read-write, relabelled
and remounted back read-only.
|
|
|
|
|
|
| |
Now that we don't kill control processes anymore, let's at least warn
about any processes left-over in the unit cgroup at the moment of
starting the unit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this patch, the bpf cgroup realization state was implicitly set
to "NO", meaning that the bpf configuration was realized but was turned
off. That means invalidation requests for the bpf stuff (which we issue
in blanket fashion when doing a daemon reload) would actually later
result in a us re-realizing the unit, under the assumption it was
already realized once, even though in reality it never was realized
before.
This had the effect that after each daemon-reload we'd end up realizing
*all* defined units, even the unloaded ones, populating cgroupfs with
lots of unneeded empty cgroups.
With this fix we properly set the realiazation state to "INVALIDATED",
i.e. indicating the bpf stuff was never set up for the unit, and hence
when we try to invalidate it later we won't do anything.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shall actually realize
We add units to the cgroup realization queue when propagating realizing
requests to sibling units, and when invalidating cgroup settings because
some cgroup setting changed. In the time between where we add the unit
to the queue until the cgroup is actually dispatched the unit's state
might have changed however, so that the unit doesn't actually need to be
realized anymore, for example because the unit went down. To handle
that, check the unit state again, if realization makes sense.
Redundant realization is usually not a problem, except when the unit is
not actually running, hence check exactly for that.
|
| |
|
|
|
|
|
| |
Now that d3070fbdf6077d7d has been merged, these errors are not as
critical as they used to be.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make sure to add the delegation mask to the mask of controllers we have
to enable on our own unit. Do not claim it was a members mask, as such
a logic would mean we'd collide with cgroupv2's "no processes on inner
nodes policy".
This change does the right thing: it means any controller enabled
through Controllers= will be made available to subcrgoups of our unit,
but the unit itself has to still enable it through
cgroup.subtree_control (which it can since that file is delegated too)
to be inherited further down.
Or to say this differently: we only should manipulate
cgroup.subtree_control ourselves for inner nodes (i.e. slices), and
for leaves we need to provide a way to enable controllers in the slices
above, but stay away from the cgroup's own cgroup.subtree_control —
which is what this patch ensures.
Fixes: #7355
|
|
|
|
| |
Just the error check and message were wrong, otherwise the logic was OK.
|
| |
|
|
|
|
|
|
|
|
| |
Previously it was not possible to select which controllers to enable for
a unit where Delegate=yes was set, as all controllers were enabled. With
this change, this is made configurable, and thus delegation units can
pick specifically what they want to manage themselves, and what they
don't care about.
|
|
|
|
|
| |
subtree_mask is own_mask | members_mask, let's make use of that to
shorten a few things
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This replaces the dependencies Set* objects by Hashmap* objects, where
the key is the depending Unit, and the value is a bitmask encoding why
the specific dependency was created.
The bitmask contains a number of different, defined bits, that indicate
why dependencies exist, for example whether they are created due to
explicitly configured deps in files, by udev rules or implicitly.
Note that memory usage is not increased by this change, even though we
store more information, as we manage to encode the bit mask inside the
value pointer each Hashmap entry contains.
Why this all? When we know how a dependency came to be, we can update
dependencies correctly when a configuration source changes but others
are left unaltered. Specifically:
1. We can fix UDEV_WANTS dependency generation: so far we kept adding
dependencies configured that way, but if a device lost such a
dependency we couldn't them again as there was no scheme for removing
of dependencies in place.
2. We can implement "pin-pointed" reload of unit files. If we know what
dependencies were created as result of configuration in a unit file,
then we know what to flush out when we want to reload it.
3. It's useful for debugging: "elogind-analyze dump" now shows
this information, helping substantially with understanding how
elogind's dependency tree came to be the way it came to be.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Same justification as for HAVE_UTMP.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2
|
|
|
|
|
|
|
| |
This is particularly useful when used in conjunction with DynamicUser=1,
where the UID might change for every invocation, but is useful in other
cases too, for example, when these directories are shared between
systems where the UID assignments differ slightly.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes sure that if we learn via inotify or another event source
that a cgroup is empty, and we checked that this is indeed the case (as
we might get spurious notifications through inotify, as the inotify
logic through the "cgroups.event" is pretty unspecific and might be
trigger for a variety of reasons), then we'll enqueue a defer event for
it, at a priority lower than SIGCHLD handling, so that we know for sure
that if there's waitid() data for a process we used it before
considering the cgroup empty notification.
Fixes: #6608
|
|
|
|
|
|
|
|
|
| |
We are about to add second cgroup-related queue, called
"cgroup_empty_queue", hence let's rename "cgroup_queue" to
"cgroup_realize_queue" (as that is its purpose) to minimize confusion
about the two queues.
Just a rename, no functional changes.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We used to be a bit sloppy on this, and handed out accounting data even
for units where accounting wasn't explicitly enabled. Let's be stricter
here, so that we know the accounting data is actually fully valid. This
is necessary, as the accounting data is no longer stored exclusively in
cgroupfs, but is partly maintained external of that, and flushed during
unit starts. We should hence only expose accounting data we really know
is fully current.
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change we'll invalidate all cgroup settings after coming back
from a daemon reload/reexec, so that the new settings are instantly
applied.
This is useful for the BPF case, because we don't serialize/deserialize
the BPF program fd, and hence have to install a new, updated BPF program
when coming back from the reload/reexec. However, this is also useful
for the rest of the cgroup settings, as it ensures that user
configuration really takes effect wherever we can.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make sure the current IP accounting counters aren't lost during
reload/reexec.
Note that we destroy all BPF file objects during a reload: the BPF
programs, the access and the accounting maps. The former two need to be
regenerated anyway with the newly loaded configuration data, but the
latter one needs to survive reloads/reexec. In this implementation I
opted to only save/restore the accounting map content instead of the map
itself. While this opens a (theoretic) window where IP traffic is still
accounted to the old map after we read it out, and we thus miss a few
bytes this has the benefit that we can alter the map layout between
versions should the need arise.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Add pointers for compiled eBPF programs as well as list heads for allowed
and denied hosts for both directions.
|
|
|
|
|
|
|
| |
Add a config directive parser that takes multiple space separated IPv4
or IPv6 addresses with optional netmasks in CIDR notation rvalue and
puts a parsed version of it to linked list of IPAddressAccessItem objects.
The code actually using this will be added later.
|
|
|
|
|
| |
Less deviation between test runs and normal runs is always a good idea,
hence enable more stuff that is safe in test runs
|
|
|
|
|
|
|
| |
This doesn't really matter, as we never invalidate cpuacct explicitly,
and there's no real reason to care for it explicitly, however it's
prettier if we always treat cpu and cpuacct as belonging together, the
same way we conisder "io" and "blkio" to belong together.
|
|
|
|
|
|
|
|
|
|
| |
Now generators are only run in elogind --test mode, where this makes
most sense (how are you going to test what would happen otherwise?).
Fixes #6842.
v2:
- rename test_run to test_run_flags
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
Since busname units are only useful with kdbus, they weren't actively
used. This was dead code, only compile-tested. If busname units are
ever added back, it'll be cleaner to start from scratch (possibly reverting
parts of this patch).
|
|
|
|
|
|
|
|
| |
Upstream thinks, that the auto tools are too 'legacy', or that they
are at least no longer fitting.
We follow, as the classic auto tools files have been removed, so no
other choice here...
|
| |
|
|
|
|
|
|
|
|
| |
All those uses were correct, but I think it's better to be explicit.
Using implicit errno is too error prone, and with this change we can require
(in the sense of a style guideline) that the code is always specified.
Helpful query: git grep -n -P 'log_[^s][a-z]+\(.*%m'
|
|
|
|
|
| |
These functions, although not used by elogind itself, are mostly tiny
and crucial for important tests to work.
|
| |
|
|
|
|
|
| |
cg_unified() is a bit generic a name, let's make clear that it checks
whether a specified controller is in unified mode.
|
|
|
|
|
|
|
|
|
| |
We use our cgroup APIs in various contexts, including from our libraries
sd-login, sd-bus. As we don#t control those environments we can't rely
that the unified cgroup setup logic succeeds, and hence really shouldn't
assert on it.
This more or less reverts 415fc41ceaeada2e32639f24f134b1c248b9e43f.
|
|
|
|
|
|
|
|
|
|
|
| |
We need this to gracefully support older or strangely configured kernels.
v2:
- do not install a callback handler, just embed the right conditions into
cg_is_*_wanted()
v3:
- fix bug in cg_is_legacy_wanted()
|