summaryrefslogtreecommitdiff
path: root/src/core
Commit message (Collapse)AuthorAge
* Revert in-tree changes to upstream src.Mark Hindley2018-11-01
|
* Drop my copyright headersZbigniew Jędrzejewski-Szmek2018-08-24
| | | | | | | perl -i -0pe 's/\s*Copyright © .... Zbigniew Jędrzejewski.*?\n/\n/gms' man/*xml git grep -e 'Copyright.*Jędrzejewski' -l | xargs perl -i -0pe 's/(#\n)?# +Copyright © [0-9, -]+ Zbigniew Jędrzejewski.*?\n//gms' git grep -e 'Copyright.*Jędrzejewski' -l | xargs perl -i -0pe 's/\s*\/\*\*\*\s+Copyright © [0-9, -]+ Zbigniew Jędrzejewski[^\n]*?\s*\*\*\*\/\s*/\n\n/gms' git grep -e 'Copyright.*Jędrzejewski' -l | xargs perl -i -0pe 's/\s+Copyright © [0-9, -]+ Zbigniew Jędrzejewski[^\n]*//gms'
* tree-wide: beautify remaining copyright statementsLennart Poettering2018-08-24
| | | | | | Let's unify an beautify our remaining copyright statements, with a unicode ©. This means our copyright statements are now always formatted the same way. Yay.
* tree-wide: remove Lennart's copyright linesLennart Poettering2018-08-24
| | | | | | | | | | | These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.
* tree-wide: drop 'This file is part of systemd' blurbLennart Poettering2018-08-24
| | | | | | | | | | | | | | | | This part of the copyright blurb stems from the GPL use recommendations: https://www.gnu.org/licenses/gpl-howto.en.html The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.
* core: when applying io/blkio per-device rules, don't remove them if they failLennart Poettering2018-08-24
| | | | | These devices might show up later, hence leave the rules as they are. Applying the limits should not alter configuration.
* tree-wide: unify how we define bit mak enumsLennart Poettering2018-08-24
| | | | | | Let's always write "1 << 0", "1 << 1" and so on, except where we need more than 31 flag bits, where we write "UINT64(1) << 0", and so on to force 64bit values.
* cgroup: beef up device lookup logic for block devicesLennart Poettering2018-08-24
| | | | Let's chase block devices through btrfs and LUKS like we do elsewhere.
* pid1: do not reset subtree_control on already-existing units with delegationZbigniew Jędrzejewski-Szmek2018-08-24
| | | | | | | | | | | | | | | | | Fixes #8364. Reproducer: $ sudo systemd-run -t -p Delegate=yes bash # mkdir /sys/fs/cgroup/system.slice/run-u6958.service/supervisor # echo $$ > /sys/fs/cgroup/system.slice/run-u6958.service/supervisor/cgroup.procs # echo +memory > /sys/fs/cgroup/system.slice/run-u6958.service/cgroup.subtree_control # cat /sys/fs/cgroup/system.slice/run-u6958.service/cgroup.subtree_control memory # systemctl daemon-reload # cat /sys/fs/cgroup/system.slice/run-u6958.service/cgroup.subtree_control (empty) With patch, the last command shows 'memory'.
* cgroup: tiny log message tweak, say that we ignore one kind of failureLennart Poettering2018-08-24
|
* Add macro for checking if some flags are setZbigniew Jędrzejewski-Szmek2018-08-24
| | | | | | | | | This way we don't need to repeat the argument twice. I didn't replace all instances. I think it's better to leave out: - asserts - comparisons like x & y == x, which are mathematically equivalent, but here we aren't checking if flags are set, but if the argument fits in the flags.
* path-util: introduce path_simplify()Yu Watanabe2018-08-24
| | | | | | | | The function is similar to path_kill_slashes() but also removes initial './', trailing '/.', and '/./' in the path. When the second argument of path_simplify() is false, then it behaves as the same as path_kill_slashes(). Hence, this also replaces path_kill_slashes() with path_simplify().
* core/mount-setup: remove part of check which is always trueZbigniew Jędrzejewski-Szmek2018-08-24
| | | | | | | f1470e424b2b5337e3c383d68dc5a26af1ff4ce6 removed one check, but missed a similar one a few lines down. CID #1390949.
* core: Break circular dependency between unit.h and cgroup.hFelipe Sateler2018-08-24
|
* core/mount-setup: remove part of check which is always trueZbigniew Jędrzejewski-Szmek2018-08-24
| | | | | | | k was set to join_controllers at this point and only incremented, so it cannot be null at this point. CID #1390949.
* meson: generate m4 preprocessor from config.h (#8914)Yu Watanabe2018-08-24
|
* mount-setup: add a comment that the character/block device nodes are ↵Lennart Poettering2018-08-24
| | | | | | | | | | "optional" (#8893) if we lack privs to create device nodes that's fine, and creating /run/systemd/inaccessible/chr or /run/systemd/inaccessible/blk won't work then. Document this in longer comments. Fixes: #4484
* meson: do not link libsystemd_static into libcore (#8813)Zbigniew Jędrzejewski-Szmek2018-08-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (or in terms of the names of the actual files on disk, do not link libsystemd-shared-238.a into libcore.a). libsystemd_static is linked into libsystemd_shared, which in turn means that anything that links to libcore and libsystemd_shared will get libsystemd_static twice: $ cc -o systemd 'systemd@exe/src_core_main.c.o' -Wl,--no-undefined -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -pie -DVALGRIND -Wl,--start-group src/core/libcore.a src/shared/libsystemd-shared-238.a src/shared/libsystemd-shared-238.so -pthread -lrt -lseccomp -lselinux -lmount -lblkid -Wl,--end-group -lseccomp -lpam -L/lib64 -laudit -lkmod -lmount -lrt -lcap -lacl -lcryptsetup -lgcrypt -lip4tc -lip6tc -lseccomp -lselinux -lidn -llzma -llz4 -lblkid '-Wl,-rpath,$ORIGIN/src/shared' -Wl,-rpath-link,/home/zbyszek/src/systemd/build/src/shared This propagation of the dependency seems correct (in the sense that meson is doing the expected thing based on the given configuration). Linking was done this way in the original meson conversion. I was probably trying to get everything to compile and link, I'm not sure why this particular choice was made. In the meantime, meson has gotten better at propagating dependencies, so it's possible that this had slightly different effect in the original conversion, but I did not verify this. Either way, I think we should drop this. With the patch: $ cc -o systemd 'systemd@exe/src_core_main.c.o' -Wl,--no-undefined -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -pie -DVALGRIND -Wl,--start-group src/core/libcore.a src/shared/libsystemd-shared-238.so -pthread -lrt -lseccomp -lselinux -lmount -Wl,--end-group -lblkid -lrt -lseccomp -lpam -L/lib64 -laudit -lkmod -lselinux -lmount '-Wl,-rpath,$ORIGIN/src/shared' -Wl,-rpath-link,/home/zbyszek/src/systemd/build/src/shared This is more correct because we're not linking the same code twice. With the patch, libystemd_static is used in exactly four places: - src/shared/libsystemd-shared-238.so - src/udev/libudev.so.1.6.10 - pam_systemd.so - test-bus-error (compared to a bunch more executables before, including systemd, systemd-analyze, test-hostname, test-ns, etc.) Size savings are also noticable: $ size /var/tmp/inst?/usr/lib/systemd/libsystemd-shared-238.so text data bss dec hex filename 2397826 578488 15920 2992234 2da86a /var/tmp/inst1/usr/lib/systemd/libsystemd-shared-238.so 2397826 578488 15920 2992234 2da86a /var/tmp/inst2/usr/lib/systemd/libsystemd-shared-238.so $ size /var/tmp/inst?/usr/lib/systemd/systemd text data bss dec hex filename 1858790 261688 9320 2129798 207f86 /var/tmp/inst1/usr/lib/systemd/systemd 1556358 258704 8072 1823134 1bd19e /var/tmp/inst2/usr/lib/systemd/systemd $ du -s /var/tmp/inst? 52216 /var/tmp/inst1 50844 /var/tmp/inst2 https://github.com/google/oss-fuzz/issues/1330#issuecomment-384054530 might be related.
* tree-wide: avoid assignment of r just to use in a comparisonZbigniew Jędrzejewski-Szmek2018-08-24
| | | | | | | | | This changes r = ...; if (r < 0) to if (... < 0) when r will not be used again.
* tree-wide: drop spurious newlines (#8764)Lennart Poettering2018-08-24
| | | | | | | | | Double newlines (i.e. one empty lines) are great to structure code. But let's avoid triple newlines (i.e. two empty lines), quadruple newlines, quintuple newlines, …, that's just spurious whitespace. It's an easy way to drop 121 lines of code, and keeps the coding style of our sources a bit tigther.
* util-lib: introduce new empty_or_root() helper (#8746)Lennart Poettering2018-08-24
| | | | | | | | We check the same condition at various places. Let's add a trivial, common helper for this, and use it everywhere. It's not going to make things much faster or much shorter, but I think a lot more readable
* tree-wide: drop license boilerplateZbigniew Jędrzejewski-Szmek2018-08-24
| | | | | | | | | | Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.
* core: skip the removal of cgroups in the TEST_RUN_MINIMAL mode (#8622)Evgeny Vereshchagin2018-08-24
| | | | | | | When `systemd` is run in the TEST_RUN_MINIMAL mode, it doesn't really set up cgroups, so it shouldn't try to remove anything. Closes https://github.com/systemd/systemd/issues/8474.
* machine-image,mount-setup: minor coding style fixesLennart Poettering2018-08-24
|
* core: dont't remount /sys/fs/cgroup for relabel if not needed (#8595)Krzysztof Nowicki2018-08-24
| | | | | | | | | | | | | | | | | | | | | | The initial fix for relabelling the cgroup filesystem for SELinux delivered in commit 8739f23e3 was based on the assumption that the cgroup filesystem is already populated once mount_setup() is executed, which was true for my system. What I wasn't aware is that this is the case only when another instance of systemd was running before this one, which can happen if systemd is used in the initrd (for ex. by dracut). In case of a clean systemd start-up the cgroup filesystem is actually being populated after mount_setup() and does not need relabelling as at that moment the SELinux policy is already loaded. Since however the root cgroup filesystem was remounted read-only in the meantime this operation will now fail. To fix this check for the filesystem mount flags before relabelling and only remount ro->rw->ro if necessary and leave the filesystem read-write otherwise. Fixes #7901.
* label: rework label_fix() implementations (#8583)Lennart Poettering2018-08-24
| | | | | | | | | | | | | | | | | | | | | This reworks the SELinux and SMACK label fixing calls in a number of ways: 1. The two separate boolean arguments of these functions are converted into a flags type LabelFixFlags. 2. The operations are now implemented based on O_PATH. This should resolve TTOCTTOU races between determining the label for the file system object and applying it, as it it allows to pin the object while we are operating on it. 3. When changing a label fails we'll query the label previously set, and if matches what we want to set anyway we'll suppress the error. Also, all calls to label_fix() are now (void)ified, when we ignore the return values. Fixes: #8566
* macro: introduce TAKE_PTR() macroLennart Poettering2018-08-24
| | | | | | | | | | | | | | | | This macro will read a pointer of any type, return it, and set the pointer to NULL. This is useful as an explicit concept of passing ownership of a memory area between pointers. This takes inspiration from Rust: https://doc.rust-lang.org/std/option/enum.Option.html#method.take and was suggested by Alan Jenkins (@sourcejedi). It drops ~160 lines of code from our codebase, which makes me like it. Also, I think it clarifies passing of ownership, and thus helps readability a bit (at least for the initiated who know the new macro)
* Fix foul typo in f281944Sven Eden2018-06-29
|
* Fix cgroup directory mounting:Sven Eden2018-06-29
| | | | | | | | | | | | | A little misunderstanding has been fixed, and elogind now mounts the following directories, if (and only if) it has to act as its own cgroups controller. * -Ddefault-hierarchy=legacy : /sys/fs/cgroup as tmpfs /sys/fs/cgroup/elogind as cgroup * -Ddefault-hierarchy=hybrid : The same as with 'legacy', plus /sys/fs/cgroup/unified as cgroup2 * -Ddefault-hierarchy=unified : /sys/fs/cgroup2 as cgroup2
* Prep v238: Mask cg_trim() call in manager_shutdown_cgroup() as elogind is ↵Sven Eden2018-06-28
| | | | not init.
* core: dont't remount /sys/fs/cgroup for relabel if not needed (#8595)Krzysztof Nowicki2018-06-28
| | | | | | | | | | | | | | | | | | | | | | | | | | The initial fix for relabelling the cgroup filesystem for SELinux delivered in commit 8739f23e3 was based on the assumption that the cgroup filesystem is already populated once mount_setup() is executed, which was true for my system. What I wasn't aware is that this is the case only when another instance of systemd was running before this one, which can happen if systemd is used in the initrd (for ex. by dracut). In case of a clean systemd start-up the cgroup filesystem is actually being populated after mount_setup() and does not need relabelling as at that moment the SELinux policy is already loaded. Since however the root cgroup filesystem was remounted read-only in the meantime this operation will now fail. To fix this check for the filesystem mount flags before relabelling and only remount ro->rw->ro if necessary and leave the filesystem read-write otherwise. Fixes #7901. (cherry picked from commit 6f7729c1767998110c4460c85c94435c5782a613) Also https://bugzilla.redhat.com/show_bug.cgi?id=1576240.
* core: skip the removal of cgroups in the TEST_RUN_MINIMAL mode (#8622)Evgeny Vereshchagin2018-06-28
| | | | | | | | | When `systemd` is run in the TEST_RUN_MINIMAL mode, it doesn't really set up cgroups, so it shouldn't try to remove anything. Closes https://github.com/systemd/systemd/issues/8474. (cherry picked from commit f6c63f6fc90040f0017a7cc37f3a05d5b86226d7)
* core: ignore errors from cg_create_and_attach() in test mode (#8401)Michal Sekletar2018-06-28
| | | | | | | | | | | | | | | | | | | | | | | | Reproducer: $ meson build && cd build $ ninja $ sudo useradd test $ sudo su test $ ./systemd --system --test ... Failed to create /user.slice/user-1000.slice/session-6.scope/init.scope control group: Permission denied Failed to allocate manager object: Permission denied Above error message is caused by the fact that user test didn't have its own session and we tried to set up init.scope already running as user test in the directory owned by different user. Let's try to setup cgroup hierarchy, but if that fails return error only when not running in the test mode. Fixes #8072 (cherry picked from commit aa77e234fce7413b7dd64f99ea51450f2e2e9dbd)
* core: do not free heap-allocated strings (#8391)Yu Watanabe2018-06-28
| | | | | | Fixes #8387. (cherry picked from commit 5cbaad2f6795088db56063d20695c6444595822f)
* Prep v238: Uncomment now needed headers and unmask now needed functions in ↵Sven Eden2018-06-05
| | | | src/core (2/6)
* Prep v238: Applied some upstream updates to src/core (4/5)Sven Eden2018-06-04
|
* mount-setup: change bpf mount mode to 0700 (#8334)Lennart Poettering2018-05-30
| | | | | After discussing with the kernel folks, we agreed to default to 0700 for this. Better safe than sorry.
* core: turn on memory/cpu/tasks accounting by default for the root sliceLennart Poettering2018-05-30
| | | | | | | | The kernel exposes the necessary data in /proc anyway, let's expose it hence by default. With this in place "systemctl status -- -.slice" will show accounting data out-of-the-box now.
* core: hook up /proc queries for the root slice, tooLennart Poettering2018-05-30
| | | | | Do what we already prepped in cgtop for the root slice in PID 1 too: consult /proc for the data we need.
* cgroup-util: rework cg_get_keyed_attribute() a bitLennart Poettering2018-05-30
| | | | | | | | | | | | | Let's make sure we don't clobber the return parameter on failure, to follow our coding style. Also, break the loop early if we have all attributes we need. This also changes the keys parameter to a simple char**, so that we can use STRV_MAKE() for passing the list of attributes to read. This also makes it possible to distuingish the case when the whole attribute file doesn't exist from one key in it missing. In the former case we return -ENOENT, in the latter we now return -ENXIO.
* mount-setup: always use the same source as fstype for the API VFS we mountLennart Poettering2018-05-30
| | | | | | | So far, for all our API VFS mounts we used the fstype also as mount source, let's do that for the cgroupsv2 mounts too. The kernel doesn't really care about the source for API VFS, but it's visible to the user, hence let's clean this up and follow the rule we otherwise follow.
* bpf: use BPF_F_ALLOW_MULTI flag if it is availableLennart Poettering2018-05-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This new kernel 4.15 flag permits that multiple BPF programs can be executed for each packet processed: multiple per cgroup plus all programs defined up the tree on all parent cgroups. We can use this for two features: 1. Finally provide per-slice IP accounting (which was previously unavailable) 2. Permit delegation of BPF programs to services (i.e. leaf nodes). This patch beefs up PID1's handling of BPF to enable both. Note two special items to keep in mind: a. Our inner-node BPF programs (i.e. the ones we attach to slices) do not enforce IP access lists, that's done exclsuively in the leaf-node BPF programs. That's a good thing, since that way rules in leaf nodes can cancel out rules further up (i.e. for example to implement a logic of "disallow everything except httpd.service"). Inner node BPF programs to accounting however if that's requested. This is beneficial for performance reasons: it means in order to provide per-slice IP accounting we don't have to add up all child unit's data. b. When this code is run on pre-4.15 kernel (i.e. where BPF_F_ALLOW_MULTI is not available) we'll make IP acocunting on slice units unavailable (i.e. revert to behaviour from before this commit). For leaf nodes we'll fallback to non-ALLOW_MULTI mode however, which means that BPF delegation is not available there at all, if IP fw/acct is turned on for the unit. This is a change from earlier behaviour, where we use the BPF_F_ALLOW_OVERRIDE flag, so that our fw/acct would lose its effect as soon as delegation was turned on and some client made use of that. I think the new behaviour is the safer choice in this case, as silent bypassing of our fw rules is not possible anymore. And if people want proper delegation then the way out is a more modern kernel or turning off IP firewalling/acct for the unit algother.
* bpf: mount bpffs by default on bootLennart Poettering2018-05-30
| | | | | | We make heavy use of BPF functionality these days, hence expose the BPF file system too by default now. (Note however, that we don't actually make use bpf file systems object yet, but we might later on too.)
* pid1: do not initialize join_controllers by defaultZbigniew Jędrzejewski-Szmek2018-05-30
| | | | | We're moving towards unified cgroup hierarchy where this is not necessary. This makes main.c a bit simpler.
* meson: drop double .in suffix for o.fd.systemd1.policy fileZbigniew Jędrzejewski-Szmek2018-05-30
| | | | | This file is now undergoing just one transformation, so drop the unnecessary suffix.
* Gettextize policy filesGunnar Hjalmarsson2018-05-30
| | | | | | | * Don't merge translations into the files * Add gettext-domain="systemd" to description and message Closes #8162, replaces #8118.
* meson: add -Dmemory-accounting-default=true|falseZbigniew Jędrzejewski-Szmek2018-05-30
| | | | | This makes it easy to set the default for distributions and users which want to default to off because they primarily use older kernels.
* core: add new new bus call for migrating foreign processes to scope/service ↵Lennart Poettering2018-05-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | units This adds a new bus call to service and scope units called AttachProcesses() that moves arbitrary processes into the cgroup of the unit. The primary user for this new API is systemd itself: the systemd --user instance uses this call of the systemd --system instance to migrate processes if itself gets the request to migrate processes and the kernel refuses this due to access restrictions. The primary use-case of this is to make "systemd-run --scope --user …" invoked from user session scopes work correctly on pure cgroupsv2 environments. There, the kernel refuses to migrate processes between two unprivileged-owned cgroups unless the requestor as well as the ownership of the closest parent cgroup all match. This however is not the case between the session-XYZ.scope unit of a login session and the user@ABC.service of the systemd --user instance. The new logic always tries to move the processes on its own, but if that doesn't work when being the user manager, then the system manager is asked to do it instead. The new operation is relatively restrictive: it will only allow to move the processes like this if the caller is root, or the UID of the target unit, caller and process all match. Note that this means that unprivileged users cannot attach processes to scope units, as those do not have "owning" users (i.e. they have now User= field). Fixes: #3388
* cgroup: add a new "can_delegate" flag to the unit vtable, and set it for ↵Lennart Poettering2018-05-30
| | | | | | | | | | | | | | | | scope and service units only Currently we allowed delegation for alluntis with cgroup backing except for slices. Let's make this a bit more strict for now, and only allow this in service and scope units. Let's also add a generic accessor unit_cgroup_delegate() for checking whether a unit has delegation turned on that checks the new bool first. Also, when doing transient units, let's explcitly refuse turning on delegation for unit types that don#t support it. This is mostly cosmetical as we wouldn't act on the delegation request anyway, but certainly helpful for debugging.
* core: propagate TasksMax= on the root slice to sysctlsLennart Poettering2018-05-30
| | | | | | | | | | The cgroup "pids" controller is not supported on the root cgroup. However we expose TasksMax= on it, but currently don't actually apply it to anything. Let's correct this: if set, let's propagate things to the right sysctls. This way we can expose TasksMax= on all units in a somewhat sensible way.