| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
|
| |
perl -i -0pe 's/\s*Copyright © .... Zbigniew Jędrzejewski.*?\n/\n/gms' man/*xml
git grep -e 'Copyright.*Jędrzejewski' -l | xargs perl -i -0pe 's/(#\n)?# +Copyright © [0-9, -]+ Zbigniew Jędrzejewski.*?\n//gms'
git grep -e 'Copyright.*Jędrzejewski' -l | xargs perl -i -0pe 's/\s*\/\*\*\*\s+Copyright © [0-9, -]+ Zbigniew Jędrzejewski[^\n]*?\s*\*\*\*\/\s*/\n\n/gms'
git grep -e 'Copyright.*Jędrzejewski' -l | xargs perl -i -0pe 's/\s+Copyright © [0-9, -]+ Zbigniew Jędrzejewski[^\n]*//gms'
|
|
|
|
|
|
| |
Let's unify an beautify our remaining copyright statements, with a
unicode ©. This means our copyright statements are now always formatted
the same way. Yay.
|
|
|
|
|
|
|
|
|
|
|
| |
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
|
|
|
|
|
| |
These devices might show up later, hence leave the rules as they are.
Applying the limits should not alter configuration.
|
|
|
|
|
|
| |
Let's always write "1 << 0", "1 << 1" and so on, except where we need
more than 31 flag bits, where we write "UINT64(1) << 0", and so on to force
64bit values.
|
|
|
|
| |
Let's chase block devices through btrfs and LUKS like we do elsewhere.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #8364.
Reproducer:
$ sudo systemd-run -t -p Delegate=yes bash
# mkdir /sys/fs/cgroup/system.slice/run-u6958.service/supervisor
# echo $$ > /sys/fs/cgroup/system.slice/run-u6958.service/supervisor/cgroup.procs
# echo +memory > /sys/fs/cgroup/system.slice/run-u6958.service/cgroup.subtree_control
# cat /sys/fs/cgroup/system.slice/run-u6958.service/cgroup.subtree_control
memory
# systemctl daemon-reload
# cat /sys/fs/cgroup/system.slice/run-u6958.service/cgroup.subtree_control
(empty)
With patch, the last command shows 'memory'.
|
| |
|
|
|
|
|
|
|
|
|
| |
This way we don't need to repeat the argument twice.
I didn't replace all instances. I think it's better to leave out:
- asserts
- comparisons like x & y == x, which are mathematically equivalent, but
here we aren't checking if flags are set, but if the argument fits in the
flags.
|
|
|
|
|
|
|
|
| |
The function is similar to path_kill_slashes() but also removes
initial './', trailing '/.', and '/./' in the path.
When the second argument of path_simplify() is false, then it
behaves as the same as path_kill_slashes(). Hence, this also
replaces path_kill_slashes() with path_simplify().
|
|
|
|
|
|
|
| |
f1470e424b2b5337e3c383d68dc5a26af1ff4ce6 removed one check, but missed a similar
one a few lines down.
CID #1390949.
|
| |
|
|
|
|
|
|
|
| |
k was set to join_controllers at this point and only incremented, so
it cannot be null at this point.
CID #1390949.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
"optional" (#8893)
if we lack privs to create device nodes that's fine, and creating
/run/systemd/inaccessible/chr or /run/systemd/inaccessible/blk won't
work then. Document this in longer comments.
Fixes: #4484
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(or in terms of the names of the actual files on disk, do not link
libsystemd-shared-238.a into libcore.a).
libsystemd_static is linked into libsystemd_shared, which in turn means that
anything that links to libcore and libsystemd_shared will get libsystemd_static
twice:
$ cc -o systemd 'systemd@exe/src_core_main.c.o' -Wl,--no-undefined -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -pie -DVALGRIND -Wl,--start-group src/core/libcore.a src/shared/libsystemd-shared-238.a src/shared/libsystemd-shared-238.so -pthread -lrt -lseccomp -lselinux -lmount -lblkid -Wl,--end-group -lseccomp -lpam -L/lib64 -laudit -lkmod -lmount -lrt -lcap -lacl -lcryptsetup -lgcrypt -lip4tc -lip6tc -lseccomp -lselinux -lidn -llzma -llz4 -lblkid '-Wl,-rpath,$ORIGIN/src/shared' -Wl,-rpath-link,/home/zbyszek/src/systemd/build/src/shared
This propagation of the dependency seems correct (in the sense that meson is
doing the expected thing based on the given configuration). Linking was done
this way in the original meson conversion. I was probably trying to get
everything to compile and link, I'm not sure why this particular choice was
made. In the meantime, meson has gotten better at propagating dependencies, so
it's possible that this had slightly different effect in the original
conversion, but I did not verify this. Either way, I think we should drop this.
With the patch:
$ cc -o systemd 'systemd@exe/src_core_main.c.o' -Wl,--no-undefined -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -pie -DVALGRIND -Wl,--start-group src/core/libcore.a src/shared/libsystemd-shared-238.so -pthread -lrt -lseccomp -lselinux -lmount -Wl,--end-group -lblkid -lrt -lseccomp -lpam -L/lib64 -laudit -lkmod -lselinux -lmount '-Wl,-rpath,$ORIGIN/src/shared' -Wl,-rpath-link,/home/zbyszek/src/systemd/build/src/shared
This is more correct because we're not linking the same code twice.
With the patch, libystemd_static is used in exactly four places:
- src/shared/libsystemd-shared-238.so
- src/udev/libudev.so.1.6.10
- pam_systemd.so
- test-bus-error
(compared to a bunch more executables before, including systemd,
systemd-analyze, test-hostname, test-ns, etc.)
Size savings are also noticable:
$ size /var/tmp/inst?/usr/lib/systemd/libsystemd-shared-238.so
text data bss dec hex filename
2397826 578488 15920 2992234 2da86a /var/tmp/inst1/usr/lib/systemd/libsystemd-shared-238.so
2397826 578488 15920 2992234 2da86a /var/tmp/inst2/usr/lib/systemd/libsystemd-shared-238.so
$ size /var/tmp/inst?/usr/lib/systemd/systemd
text data bss dec hex filename
1858790 261688 9320 2129798 207f86 /var/tmp/inst1/usr/lib/systemd/systemd
1556358 258704 8072 1823134 1bd19e /var/tmp/inst2/usr/lib/systemd/systemd
$ du -s /var/tmp/inst?
52216 /var/tmp/inst1
50844 /var/tmp/inst2
https://github.com/google/oss-fuzz/issues/1330#issuecomment-384054530 might be related.
|
|
|
|
|
|
|
|
|
| |
This changes
r = ...;
if (r < 0)
to
if (... < 0)
when r will not be used again.
|
|
|
|
|
|
|
|
|
| |
Double newlines (i.e. one empty lines) are great to structure code. But
let's avoid triple newlines (i.e. two empty lines), quadruple newlines,
quintuple newlines, …, that's just spurious whitespace.
It's an easy way to drop 121 lines of code, and keeps the coding style
of our sources a bit tigther.
|
|
|
|
|
|
|
|
| |
We check the same condition at various places. Let's add a trivial,
common helper for this, and use it everywhere.
It's not going to make things much faster or much shorter, but I think a
lot more readable
|
|
|
|
|
|
|
|
|
|
| |
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
|
|
|
|
|
|
|
| |
When `systemd` is run in the TEST_RUN_MINIMAL mode, it doesn't really
set up cgroups, so it shouldn't try to remove anything.
Closes https://github.com/systemd/systemd/issues/8474.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The initial fix for relabelling the cgroup filesystem for
SELinux delivered in commit 8739f23e3 was based on the assumption that
the cgroup filesystem is already populated once mount_setup() is
executed, which was true for my system. What I wasn't aware is that this
is the case only when another instance of systemd was running before
this one, which can happen if systemd is used in the initrd (for ex. by
dracut).
In case of a clean systemd start-up the cgroup filesystem is actually
being populated after mount_setup() and does not need relabelling as at
that moment the SELinux policy is already loaded. Since however the root
cgroup filesystem was remounted read-only in the meantime this operation
will now fail.
To fix this check for the filesystem mount flags before relabelling and
only remount ro->rw->ro if necessary and leave the filesystem read-write
otherwise.
Fixes #7901.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reworks the SELinux and SMACK label fixing calls in a number of
ways:
1. The two separate boolean arguments of these functions are converted
into a flags type LabelFixFlags.
2. The operations are now implemented based on O_PATH. This should
resolve TTOCTTOU races between determining the label for the file
system object and applying it, as it it allows to pin the object
while we are operating on it.
3. When changing a label fails we'll query the label previously set, and
if matches what we want to set anyway we'll suppress the error.
Also, all calls to label_fix() are now (void)ified, when we ignore the
return values.
Fixes: #8566
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.
This takes inspiration from Rust:
https://doc.rust-lang.org/std/option/enum.Option.html#method.take
and was suggested by Alan Jenkins (@sourcejedi).
It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A little misunderstanding has been fixed, and elogind now mounts the following
directories, if (and only if) it has to act as its own cgroups controller.
* -Ddefault-hierarchy=legacy : /sys/fs/cgroup as tmpfs
/sys/fs/cgroup/elogind as cgroup
* -Ddefault-hierarchy=hybrid : The same as with 'legacy', plus
/sys/fs/cgroup/unified as cgroup2
* -Ddefault-hierarchy=unified : /sys/fs/cgroup2 as cgroup2
|
|
|
|
| |
not init.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The initial fix for relabelling the cgroup filesystem for
SELinux delivered in commit 8739f23e3 was based on the assumption that
the cgroup filesystem is already populated once mount_setup() is
executed, which was true for my system. What I wasn't aware is that this
is the case only when another instance of systemd was running before
this one, which can happen if systemd is used in the initrd (for ex. by
dracut).
In case of a clean systemd start-up the cgroup filesystem is actually
being populated after mount_setup() and does not need relabelling as at
that moment the SELinux policy is already loaded. Since however the root
cgroup filesystem was remounted read-only in the meantime this operation
will now fail.
To fix this check for the filesystem mount flags before relabelling and
only remount ro->rw->ro if necessary and leave the filesystem read-write
otherwise.
Fixes #7901.
(cherry picked from commit 6f7729c1767998110c4460c85c94435c5782a613)
Also https://bugzilla.redhat.com/show_bug.cgi?id=1576240.
|
|
|
|
|
|
|
|
|
| |
When `systemd` is run in the TEST_RUN_MINIMAL mode, it doesn't really
set up cgroups, so it shouldn't try to remove anything.
Closes https://github.com/systemd/systemd/issues/8474.
(cherry picked from commit f6c63f6fc90040f0017a7cc37f3a05d5b86226d7)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reproducer:
$ meson build && cd build
$ ninja
$ sudo useradd test
$ sudo su test
$ ./systemd --system --test
...
Failed to create /user.slice/user-1000.slice/session-6.scope/init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
Above error message is caused by the fact that user test didn't have its
own session and we tried to set up init.scope already running as user
test in the directory owned by different user.
Let's try to setup cgroup hierarchy, but if that fails return error only
when not running in the test mode.
Fixes #8072
(cherry picked from commit aa77e234fce7413b7dd64f99ea51450f2e2e9dbd)
|
|
|
|
|
|
| |
Fixes #8387.
(cherry picked from commit 5cbaad2f6795088db56063d20695c6444595822f)
|
|
|
|
| |
src/core (2/6)
|
| |
|
|
|
|
|
| |
After discussing with the kernel folks, we agreed to default to 0700 for
this. Better safe than sorry.
|
|
|
|
|
|
|
|
| |
The kernel exposes the necessary data in /proc anyway, let's expose it
hence by default.
With this in place "systemctl status -- -.slice" will show accounting
data out-of-the-box now.
|
|
|
|
|
| |
Do what we already prepped in cgtop for the root slice in PID 1 too:
consult /proc for the data we need.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let's make sure we don't clobber the return parameter on failure, to
follow our coding style. Also, break the loop early if we have all
attributes we need.
This also changes the keys parameter to a simple char**, so that we can
use STRV_MAKE() for passing the list of attributes to read.
This also makes it possible to distuingish the case when the whole
attribute file doesn't exist from one key in it missing. In the former
case we return -ENOENT, in the latter we now return -ENXIO.
|
|
|
|
|
|
|
| |
So far, for all our API VFS mounts we used the fstype also as mount
source, let's do that for the cgroupsv2 mounts too. The kernel doesn't
really care about the source for API VFS, but it's visible to the user,
hence let's clean this up and follow the rule we otherwise follow.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new kernel 4.15 flag permits that multiple BPF programs can be
executed for each packet processed: multiple per cgroup plus all
programs defined up the tree on all parent cgroups.
We can use this for two features:
1. Finally provide per-slice IP accounting (which was previously
unavailable)
2. Permit delegation of BPF programs to services (i.e. leaf nodes).
This patch beefs up PID1's handling of BPF to enable both.
Note two special items to keep in mind:
a. Our inner-node BPF programs (i.e. the ones we attach to slices) do
not enforce IP access lists, that's done exclsuively in the leaf-node
BPF programs. That's a good thing, since that way rules in leaf nodes
can cancel out rules further up (i.e. for example to implement a
logic of "disallow everything except httpd.service"). Inner node BPF
programs to accounting however if that's requested. This is
beneficial for performance reasons: it means in order to provide
per-slice IP accounting we don't have to add up all child unit's
data.
b. When this code is run on pre-4.15 kernel (i.e. where
BPF_F_ALLOW_MULTI is not available) we'll make IP acocunting on slice
units unavailable (i.e. revert to behaviour from before this commit).
For leaf nodes we'll fallback to non-ALLOW_MULTI mode however, which
means that BPF delegation is not available there at all, if IP
fw/acct is turned on for the unit. This is a change from earlier
behaviour, where we use the BPF_F_ALLOW_OVERRIDE flag, so that our
fw/acct would lose its effect as soon as delegation was turned on and
some client made use of that. I think the new behaviour is the safer
choice in this case, as silent bypassing of our fw rules is not
possible anymore. And if people want proper delegation then the way
out is a more modern kernel or turning off IP firewalling/acct for
the unit algother.
|
|
|
|
|
|
| |
We make heavy use of BPF functionality these days, hence expose the BPF
file system too by default now. (Note however, that we don't actually
make use bpf file systems object yet, but we might later on too.)
|
|
|
|
|
| |
We're moving towards unified cgroup hierarchy where this is not necessary.
This makes main.c a bit simpler.
|
|
|
|
|
| |
This file is now undergoing just one transformation, so drop the unnecessary
suffix.
|
|
|
|
|
|
|
| |
* Don't merge translations into the files
* Add gettext-domain="systemd" to description and message
Closes #8162, replaces #8118.
|
|
|
|
|
| |
This makes it easy to set the default for distributions and users which want to
default to off because they primarily use older kernels.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
units
This adds a new bus call to service and scope units called
AttachProcesses() that moves arbitrary processes into the cgroup of the
unit. The primary user for this new API is systemd itself: the systemd
--user instance uses this call of the systemd --system instance to
migrate processes if itself gets the request to migrate processes and
the kernel refuses this due to access restrictions.
The primary use-case of this is to make "systemd-run --scope --user …"
invoked from user session scopes work correctly on pure cgroupsv2
environments. There, the kernel refuses to migrate processes between two
unprivileged-owned cgroups unless the requestor as well as the ownership
of the closest parent cgroup all match. This however is not the case
between the session-XYZ.scope unit of a login session and the
user@ABC.service of the systemd --user instance.
The new logic always tries to move the processes on its own, but if
that doesn't work when being the user manager, then the system manager
is asked to do it instead.
The new operation is relatively restrictive: it will only allow to move
the processes like this if the caller is root, or the UID of the target
unit, caller and process all match. Note that this means that
unprivileged users cannot attach processes to scope units, as those do
not have "owning" users (i.e. they have now User= field).
Fixes: #3388
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
scope and service units only
Currently we allowed delegation for alluntis with cgroup backing
except for slices. Let's make this a bit more strict for now, and only
allow this in service and scope units.
Let's also add a generic accessor unit_cgroup_delegate() for checking
whether a unit has delegation turned on that checks the new bool first.
Also, when doing transient units, let's explcitly refuse turning on
delegation for unit types that don#t support it. This is mostly
cosmetical as we wouldn't act on the delegation request anyway, but
certainly helpful for debugging.
|
|
|
|
|
|
|
|
|
|
| |
The cgroup "pids" controller is not supported on the root cgroup.
However we expose TasksMax= on it, but currently don't actually apply it
to anything. Let's correct this: if set, let's propagate things to the
right sysctls.
This way we can expose TasksMax= on all units in a somewhat sensible
way.
|