summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
* core: turn on memory/cpu/tasks accounting by default for the root sliceLennart Poettering2018-05-30
| | | | | | | | The kernel exposes the necessary data in /proc anyway, let's expose it hence by default. With this in place "systemctl status -- -.slice" will show accounting data out-of-the-box now.
* core: hook up /proc queries for the root slice, tooLennart Poettering2018-05-30
| | | | | Do what we already prepped in cgtop for the root slice in PID 1 too: consult /proc for the data we need.
* cgroup-util: rework cg_get_keyed_attribute() a bitLennart Poettering2018-05-30
| | | | | | | | | | | | | Let's make sure we don't clobber the return parameter on failure, to follow our coding style. Also, break the loop early if we have all attributes we need. This also changes the keys parameter to a simple char**, so that we can use STRV_MAKE() for passing the list of attributes to read. This also makes it possible to distuingish the case when the whole attribute file doesn't exist from one key in it missing. In the former case we return -ENOENT, in the latter we now return -ENXIO.
* procfs-util: add APIs to get consumed CPU time and used memory from /procLennart Poettering2018-05-30
| | | | | | This is preparation for emulating the "usage_usec" keyed attribute of the "cpu.stat" property of the root cgroup from data in /proc. Similar, for emulating the "memory.current" attribute.
* core: don't process dbus unit and job queue when there are already too many ↵Lennart Poettering2018-05-30
| | | | | | | | | | | | | | | | | | | | | | | | | | messages pending We maintain a queue of units and jobs that we are supposed to generate change/new notifications for because they were either just created or some of their property has changed. Let's throttle processing of this queue a bit: as soon as > 1K of bus messages are queued for writing let's skip processing the queue, and then recheck on the next iteration again. Moreover, never process more than 100 units in one go, return to the event loop after that. Both limits together should put effective limits on both space and time usage of the function, delaying further operations until a later moment, when the queue is empty or the the event loop is sufficiently idle again. This should keep the number of generated messages much lower than before on busy systems or where some client is hanging. Note that this also means a bad client can slow down message dispatching substantially for up to 90s if it likes to, for all clients. But that should be acceptable as we only allow trusted bus clients, anyway. Fixes: #8166
* sd-bus: add APIs to query the current read and write queue sizeLennart Poettering2018-05-30
|
* process-util: don't install atfork() handler more than onceLennart Poettering2018-05-30
|
* util: add new safe_close_above_stdio() wrapperLennart Poettering2018-05-30
| | | | | | At various places we only want to close fds if they are not stdin/stdout/stderr, i.e. fds 0, 1, 2. Let's add a unified helper call for that, and port everything over.
* doc: add a new doc/ directory, and move two markdown docs into themLennart Poettering2018-05-30
| | | | | | | | | I figure sooneror later we'll have more of these docs, hence let's give them a clean place to be. This leaves NEWS and README/README.md as well as the LICENSE texts in the root directory of the project since that appears to be customary for Free Software projects.
* rules: add a missing comma in 70-uaccess.rules since it improves readabilityFranck Bui2018-05-30
| | | | | | | | | | | rule-syntax-check.py failed with the following error: $ ./test/rule-syntax-check.py ./src/login/70-uaccess.rules Invalid line ./src/login/70-uaccess.rules:31: SUBSYSTEM=="sound", TAG+="uaccess" OPTIONS+="static_node=snd/timer", OPTIONS+="static_node=snd/seq" clause: TAG+="uaccess" OPTIONS+="static_node=snd/timer" The comma is actually optional but the script makes it mandatory which seems a good thing since it improves readability.
* missing_syscall: add pkey_mprotect for ppc (#8292)Zbigniew Jędrzejewski-Szmek2018-05-30
| | | | Accurate for both ppc and ppc64 according to https://fedora.juszkiewicz.com.pl/syscalls.html.
* khash: try to detect broken AF_ALG support in centos kernelsLennart Poettering2018-05-30
| | | | Fixes: #8278
* sd-login: make use of _cleanup_close_ where possibleLennart Poettering2018-05-30
|
* logind: make sure we don't trip up on half-initialized session devicesLennart Poettering2018-05-30
| | | | Fixes: #8035
* logind: check file is device node before using .st_rdevLennart Poettering2018-05-30
|
* logind: let's pack a few struct fields we can packLennart Poettering2018-05-30
|
* logind: fd 0 is a valid fdLennart Poettering2018-05-30
|
* logind: let's reduce one level of indentationLennart Poettering2018-05-30
|
* logind: propagate the right error, don't make up ENOMEMLennart Poettering2018-05-30
|
* logind: rework sd_eviocrevoke()Lennart Poettering2018-05-30
| | | | | Let's initialize static variables properly and get rid of redundant variables.
* logind: trivial improvementsLennart Poettering2018-05-30
| | | | | Just some addition whitespace, some additional assert()s, and removal of redundant variables.
* basic/xattr-util: do not cast ssize_t to intZbigniew Jędrzejewski-Szmek2018-05-30
| | | | | | | | | | | | | | | | | | | | | | gcc warns about unitialized memory access because it notices that ssize_t which is < 0 could be cast to positive int value. We know that this can't really happen because only -1 can be returned, but OTOH, in principle a large *positive* value cannot be cast properly. This is unlikely too, since xattrs cannot be too large, but it seems cleaner to just use a size_t to return the value and avoid the cast altoghter. This makes the code simpler and gcc is happy too. The following warning goes away: [113/1502] Compiling C object 'src/basic/basic@sta/xattr-util.c.o'. In file included from ../src/basic/alloc-util.h:28:0, from ../src/basic/xattr-util.c:30: ../src/basic/xattr-util.c: In function ‘fd_getcrtime_at’: ../src/basic/macro.h:207:60: warning: ‘b’ may be used uninitialized in this function [-Wmaybe-uninitialized] UNIQ_T(A,aq) < UNIQ_T(B,bq) ? UNIQ_T(A,aq) : UNIQ_T(B,bq); \ ^ ../src/basic/xattr-util.c:155:19: note: ‘b’ was declared here usec_t a, b; ^
* basic/exec-util: use _exit() to return from childZbigniew Jędrzejewski-Szmek2018-05-30
|
* basic: shorten the code a bit in two placesZbigniew Jędrzejewski-Szmek2018-05-30
| | | | gcc complains that len might be used unitialized, but afaict, this is not true.
* tree-wide: use reallocarray instead of our home-grown realloc_multiply (#8279)Zbigniew Jędrzejewski-Szmek2018-05-30
| | | | | | | | There isn't much difference, but in general we prefer to use the standard functions. glibc provides reallocarray since version 2.26. I moved explicit_bzero is configure test to the bottom, so that the two stdlib functions are at the bottom.
* basic/virt: provide a nicer message is /proc/cpuinfo is not availableZbigniew Jędrzejewski-Szmek2018-05-30
| | | | | | | | | | | | | | | | | | | | | | | | | | $ sudo systemd-run -p RootDirectory=/usr -E LD_LIBRARY_PATH=/lib/systemd/ -E SYSTEMD_LOG_LEVEL=debug /bin/systemd-detect-virt Before systemd-detect-virt[18498]: No virtualization found in DMI systemd-detect-virt[18498]: No virtualization found in CPUID systemd-detect-virt[18498]: Virtualization XEN not found, /proc/xen does not exist systemd-detect-virt[18498]: This platform does not support /proc/device-tree systemd-detect-virt[18498]: Failed to check for virtualization: No such file or directory The first four lines are at debug level, so the user would only see that last one usually, which is not very enlightening. This now becomes: systemd-detect-virt[21172]: No virtualization found in DMI systemd-detect-virt[21172]: No virtualization found in CPUID systemd-detect-virt[21172]: Virtualization XEN not found, /proc/xen does not exist systemd-detect-virt[21172]: This platform does not support /proc/device-tree systemd-detect-virt[21172]: /proc/cpuinfo not found, assuming no UML virtualization. systemd-detect-virt[21172]: This platform does not support /proc/sysinfo systemd-detect-virt[21172]: Found VM virtualization none systemd-detect-virt[21172]: none We do more checks, which is good too.
* basic/log: add an assert that does not recurse into logging functionsZbigniew Jędrzejewski-Szmek2018-05-30
| | | | | | Then it can be used in the asserts in logging functions without causing infinite recursion. The error is just printed to stderr, it should be good enough for the common case.
* udev/net-id: check all snprintf return valuesZbigniew Jędrzejewski-Szmek2018-05-30
| | | | | | | | | | | | | | | | gcc-8 throws an error if it knows snprintf might truncate output and the return value is ignored: ../src/udev/udev-builtin-net_id.c: In function 'dev_pci_slot': ../src/udev/udev-builtin-net_id.c:297:47: error: '%s' directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=] snprintf(str, sizeof str, "%s/%s/address", slots, dent->d_name); ^~ ../src/udev/udev-builtin-net_id.c:297:17: note: 'snprintf' output between 10 and 4360 bytes into a destination of size 4096 snprintf(str, sizeof str, "%s/%s/address", slots, dent->d_name); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors Let's check all return values. This actually makes the code better, because there's no point in trying to open a file when the name has been truncated, etc.
* basic/log: make sure header is printed correctly, add testZbigniew Jędrzejewski-Szmek2018-05-30
| | | | | | | | | | | | | | | | | | If log_do_header() was called with overly long parameters, it'd generate improper output. Essentially, it'd be truncated at random point, in particular missing a newline at the end, so it'd run with the next field, usually MESSAGE=. log_do_header is called with parameters from compiled code (file name, lien nubmer, etc), so in practice this was unlikely to ever be a problem, but it is possible. In particular, if systemd was compiled from sources in some deeply nested directory (which happens for example in mock and other build roots), the filename could be very long. As a safety measure, let's truncate all parameters to 256 bytes. So we have 5 fields which are 256 bytes (plus the field name prefix), and a few other fields with fixed width. This must always fit in the 2048 byte buffer. I don't think there's much gain in calculating the required length precisely, since it's a lot of fields and a few bytes allocated on the stack don't matter.
* basic/log: fix confusion with parameters to log_dispatch_internalZbigniew Jędrzejewski-Szmek2018-05-30
| | | | | | | | | | | | | | | | | | | | log_dispatch_internal has only one caller where the extra_field/extra params are not null: log_unit_full. When log_unit_full() was called, when we got to log_dispatch_internal, our header would look like this: PRIORITY=7 SYSLOG_FACILITY=3 CODE_FILE=../src/core/manager.c CODE_LINE=2145 CODE_FUNC=manager_invoke_sigchld_event USER_UNIT=gnome-terminal-server.service 65dffa7a3b984a6d9a46f0b8fb57710bUSER_INVOCATION_ID= SYSLOG_IDENTIFIER=systemd It took me a while to understand why I'm not seeing mangled messages in the journal (after all, "" is a valid rvalue for log messages). The answer is that journald rejects any field name which starts with a digit, and the MESSAGE_ID that was used here starts with a digit. Hence, those lines would be silently filtered out.
* basic/log: make log_object_internalv staticZbigniew Jędrzejewski-Szmek2018-05-30
| | | | | It makes the code easier to read, because it's obvious that the function cannot be called from elsewhere.
* basic/log: voidify snprintf statementsZbigniew Jędrzejewski-Szmek2018-05-30
| | | | | The buffers are fixed size, so the message may not fit, but we don't particularly care.
* Revert "Replace use of snprintf with xsprintf"Zbigniew Jędrzejewski-Szmek2018-05-30
| | | | | | | | This reverts commit a7419dbc59da5c8cc9e90b3d96bc947cad91ae16. _All_ changes in that commit were wrong. Fixes #8211.
* login: fix user@.service case, so we don't allow nested sessions (#8051)Alan Jenkins2018-05-30
| | | | | | | | | | | | | | > logind sessions are mostly bound to the audit session concept, and audit > sessions remain unaffected by "su", in fact they are defined to be > "sealed off", i.e. in a way that if a process entered a session once, it > will always stay with it, and so will its children, i.e. the only way to > get a new session is by forking off something off PID 1 (or something > similar) that never has been part of a session. The code had a gap. user@.service is a special case PAM session which does not create a logind session. Let's remember to check for it. Fixes #8021
* Fix format-truncation compile failure by typecasting USB IDs (#8250)Patrick Uiterwijk2018-05-30
| | | | | | | | | | | | | | | This patch adds safe_atoux16 for parsing an unsigned hexadecimal 16bit int, and uses that for parsing USB device and vendor IDs. This fixes a compile error with gcc-8 because while we know that USB IDs are 2 bytes, the compiler does not know that. ../src/udev/udev-builtin-hwdb.c:80:38: error: '%04X' directive output may be truncated writing between 4 and 8 bytes into a region of size between 2 and 6 [-Werror=format-truncation=] Signed-off-by: Adam Williamson <awilliam@redhat.com> Signed-off-by: Patrick Uiterwijk <puiterwijk@redhat.com>
* virt: detect QNX hypervisor Detect QNX hypervisor based on the CPUID.Shuang Liu2018-05-30
| | | | Fixes: #7239
* mount-setup: always use the same source as fstype for the API VFS we mountLennart Poettering2018-05-30
| | | | | | | So far, for all our API VFS mounts we used the fstype also as mount source, let's do that for the cgroupsv2 mounts too. The kernel doesn't really care about the source for API VFS, but it's visible to the user, hence let's clean this up and follow the rule we otherwise follow.
* bpf: use BPF_F_ALLOW_MULTI flag if it is availableLennart Poettering2018-05-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This new kernel 4.15 flag permits that multiple BPF programs can be executed for each packet processed: multiple per cgroup plus all programs defined up the tree on all parent cgroups. We can use this for two features: 1. Finally provide per-slice IP accounting (which was previously unavailable) 2. Permit delegation of BPF programs to services (i.e. leaf nodes). This patch beefs up PID1's handling of BPF to enable both. Note two special items to keep in mind: a. Our inner-node BPF programs (i.e. the ones we attach to slices) do not enforce IP access lists, that's done exclsuively in the leaf-node BPF programs. That's a good thing, since that way rules in leaf nodes can cancel out rules further up (i.e. for example to implement a logic of "disallow everything except httpd.service"). Inner node BPF programs to accounting however if that's requested. This is beneficial for performance reasons: it means in order to provide per-slice IP accounting we don't have to add up all child unit's data. b. When this code is run on pre-4.15 kernel (i.e. where BPF_F_ALLOW_MULTI is not available) we'll make IP acocunting on slice units unavailable (i.e. revert to behaviour from before this commit). For leaf nodes we'll fallback to non-ALLOW_MULTI mode however, which means that BPF delegation is not available there at all, if IP fw/acct is turned on for the unit. This is a change from earlier behaviour, where we use the BPF_F_ALLOW_OVERRIDE flag, so that our fw/acct would lose its effect as soon as delegation was turned on and some client made use of that. I think the new behaviour is the safer choice in this case, as silent bypassing of our fw rules is not possible anymore. And if people want proper delegation then the way out is a more modern kernel or turning off IP firewalling/acct for the unit algother.
* bpf: mount bpffs by default on bootLennart Poettering2018-05-30
| | | | | | We make heavy use of BPF functionality these days, hence expose the BPF file system too by default now. (Note however, that we don't actually make use bpf file systems object yet, but we might later on too.)
* nologin: extend the /run/nologin descriptions a bit (#8244)Lennart Poettering2018-05-30
| | | | | | | | | | | | This is an attempt to improve #8228 a bit, by extending the /run/nologin a bit, but still keeping it somewhat brief. On purpose I used the vague wording "unprivileged user" rather than "non-root user" so that pam_nologin can be updated to disable its behaviour for members of the "wheel" group one day, and our messages would still make sense. See #8228.
* basic: split out update_reboot_parameter_and_warn() into its own .c/.h filesLennart Poettering2018-05-30
| | | | | | This is primarily preparation for a follow-up commit that adds a common implementation of the other side of the reboot parameter file, i.e. the code that reads the file and issues reboot() for it.
* basic: add a common syscall wrapper around reboot()Lennart Poettering2018-05-30
| | | | | | | | | | | | | | | This mimics the raw_clone() call we have in place already and establishes a new syscall wrapper raw_reboot() that wraps the kernel's reboot() system call in a bit more low-level fashion that glibc's reboot() wrapper. The main difference is that the extra "arg" argument is supported. Ultimately this just replaces the syscall wrapper implementation we currently have at three places in our codebase by a single one. With this change this means that all our syscall() invocations are neatly separated out in static inline system call wrappers in our header functions.
* missing: always use __NR_ as prefix for syscall numbersLennart Poettering2018-05-30
| | | | | | Apparently, both __NR_ and SYS_ are useful, but we mostly use __NR_ hence use it for these two cases too, so that we settle on __NR_ exclusively.
* missing: Fix statx syscall ifdefferyLennart Poettering2018-05-30
| | | | | | Fix a copy/paste mistake. Fixes: #8238
* tree-wide: reopen log when we need to log in FORK_CLOSE_ALL_FDS childrenLennart Poettering2018-05-30
| | | | | | | | | | | | | | | | | | In a number of occasions we use FORK_CLOSE_ALL_FDS when forking off a child, since we don't want to pass fds to the processes spawned (either because we later want to execve() some other process there, or because our child might hang around for longer than expected, in which case it shouldn't keep our fd pinned). This also closes any logging fds, and thus means logging is turned off in the child. If we want to do proper logging, explicitly reopen the logs hence in the child at the right time. This is particularly crucial in the umount/remount children we fork off the shutdown binary, as otherwise the children can't log, which is why #8155 is harder to debug than necessary: the log messages we generate about failing mount() system calls aren't actually visible on screen, as they done in the child processes where the log fds are closed.
* log: only open kmsg on fallback if we actually want to use itLennart Poettering2018-05-30
| | | | | | Previously, we'd try to open kmsg on failure of the journal/syslog even if no automatic fallback to kmsg was requested — and we wouldn't even use the open connection afterwards...
* missing_syscall: when adding syscall replacements, use different names (#8229)Zbigniew Jędrzejewski-Szmek2018-05-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In meson.build we check that functions are available using: meson.get_compiler('c').has_function('foo') which checks the following: - if __stub_foo or __stub___foo are defined, return false - if foo is declared (a pointer to the function can be taken), return true - otherwise check for __builtin_memfd_create _stub is documented by glibc as It defines a symbol '__stub_FUNCTION' for each function in the C library which is a stub, meaning it will fail every time called, usually setting errno to ENOSYS. So if __stub is defined, we know we don't want to use the glibc version, but this doesn't tell us if the name itself is defined or not. If it _is_ defined, and we define our replacement as an inline static function, we get an error: In file included from ../src/basic/missing.h:1358:0, from ../src/basic/util.h:47, from ../src/basic/calendarspec.h:29, from ../src/basic/calendarspec.c:34: ../src/basic/missing_syscall.h:65:19: error: static declaration of 'memfd_create' follows non-static declaration static inline int memfd_create(const char *name, unsigned int flags) { ^~~~~~~~~~~~ .../usr/include/bits/mman-shared.h:46:5: note: previous declaration of 'memfd_create' was here int memfd_create (const char *__name, unsigned int __flags) __THROW; ^~~~~~~~~~~~ To avoid this problem, call our inline functions different than glibc, and use a #define to map the official name to our replacement. Fixes #8099. v2: - use "missing_" as the prefix instead of "_" v3: - rebase and update for statx() Unfortunately "statx" is also present in "struct statx", so the define causes issues. Work around this by using a typedef. I checked that systemd compiles with current glibc (glibc-devel-2.26-24.fc27.x86_64) if HAVE_MEMFD_CREATE, HAVE_GETTID, HAVE_PIVOT_ROOT, HAVE_SETNS, HAVE_RENAMEAT2, HAVE_KCMP, HAVE_KEYCTL, HAVE_COPY_FILE_RANGE, HAVE_BPF, HAVE_STATX are forced to 0. Setting HAVE_NAME_TO_HANDLE_AT to 0 causes an issue, but it's not because of the define, but because of struct file_handle.
* test-user-util: skip most tests for nobody if synthentization is offZbigniew Jędrzejewski-Szmek2018-05-30
| | | | | | | | | | | | When synthetisation is turned off, there's just too many ways those tests can go wrong. We are not interested in verifying that the db on disk is correct, let's just skip all checks. In the first version of this patch, I recorded if we detected a mismatch during configuration and only skipped tests in that case, but actually it is possible to change the host configuration between our configuration phase and running of the tests. It's just more robust to skip always. (This is particularly true if tests are installed.)
* test-user-util: print function delimitersZbigniew Jędrzejewski-Szmek2018-05-30
| | | | | | This makes it easier to see what is going on. Crashes may happen in a nested test_{uid,gid}_to_name_one() function, and the default backtrace doesn't show the actual string being tested.
* mount-util: call mount_option_mangle() in mount_verbose()Yu Watanabe2018-05-30
|