| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
| |
The kernel exposes the necessary data in /proc anyway, let's expose it
hence by default.
With this in place "systemctl status -- -.slice" will show accounting
data out-of-the-box now.
|
|
|
|
|
| |
Do what we already prepped in cgtop for the root slice in PID 1 too:
consult /proc for the data we need.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let's make sure we don't clobber the return parameter on failure, to
follow our coding style. Also, break the loop early if we have all
attributes we need.
This also changes the keys parameter to a simple char**, so that we can
use STRV_MAKE() for passing the list of attributes to read.
This also makes it possible to distuingish the case when the whole
attribute file doesn't exist from one key in it missing. In the former
case we return -ENOENT, in the latter we now return -ENXIO.
|
|
|
|
|
|
| |
This is preparation for emulating the "usage_usec" keyed attribute of
the "cpu.stat" property of the root cgroup from data in /proc. Similar,
for emulating the "memory.current" attribute.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
messages pending
We maintain a queue of units and jobs that we are supposed to generate
change/new notifications for because they were either just created or
some of their property has changed. Let's throttle processing of this
queue a bit: as soon as > 1K of bus messages are queued for writing
let's skip processing the queue, and then recheck on the next
iteration again.
Moreover, never process more than 100 units in one go, return to the
event loop after that. Both limits together should put effective limits
on both space and time usage of the function, delaying further
operations until a later moment, when the queue is empty or the the
event loop is sufficiently idle again.
This should keep the number of generated messages much lower than
before on busy systems or where some client is hanging.
Note that this also means a bad client can slow down message dispatching
substantially for up to 90s if it likes to, for all clients. But that
should be acceptable as we only allow trusted bus clients, anyway.
Fixes: #8166
|
| |
|
| |
|
|
|
|
|
|
| |
At various places we only want to close fds if they are not
stdin/stdout/stderr, i.e. fds 0, 1, 2. Let's add a unified helper call
for that, and port everything over.
|
|
|
|
|
|
|
|
|
| |
I figure sooneror later we'll have more of these docs, hence let's give
them a clean place to be.
This leaves NEWS and README/README.md as well as the LICENSE texts in
the root directory of the project since that appears to be customary for
Free Software projects.
|
|
|
|
|
|
|
|
|
|
|
| |
rule-syntax-check.py failed with the following error:
$ ./test/rule-syntax-check.py ./src/login/70-uaccess.rules
Invalid line ./src/login/70-uaccess.rules:31: SUBSYSTEM=="sound", TAG+="uaccess" OPTIONS+="static_node=snd/timer", OPTIONS+="static_node=snd/seq"
clause: TAG+="uaccess" OPTIONS+="static_node=snd/timer"
The comma is actually optional but the script makes it mandatory which seems a
good thing since it improves readability.
|
|
|
|
| |
Accurate for both ppc and ppc64 according to https://fedora.juszkiewicz.com.pl/syscalls.html.
|
|
|
|
| |
Fixes: #8278
|
| |
|
|
|
|
| |
Fixes: #8035
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Let's initialize static variables properly and get rid of redundant
variables.
|
|
|
|
|
| |
Just some addition whitespace, some additional assert()s, and removal of
redundant variables.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gcc warns about unitialized memory access because it notices that ssize_t which
is < 0 could be cast to positive int value. We know that this can't really
happen because only -1 can be returned, but OTOH, in principle a large
*positive* value cannot be cast properly. This is unlikely too, since xattrs
cannot be too large, but it seems cleaner to just use a size_t to return the
value and avoid the cast altoghter. This makes the code simpler and gcc is
happy too.
The following warning goes away:
[113/1502] Compiling C object 'src/basic/basic@sta/xattr-util.c.o'.
In file included from ../src/basic/alloc-util.h:28:0,
from ../src/basic/xattr-util.c:30:
../src/basic/xattr-util.c: In function ‘fd_getcrtime_at’:
../src/basic/macro.h:207:60: warning: ‘b’ may be used uninitialized in this function [-Wmaybe-uninitialized]
UNIQ_T(A,aq) < UNIQ_T(B,bq) ? UNIQ_T(A,aq) : UNIQ_T(B,bq); \
^
../src/basic/xattr-util.c:155:19: note: ‘b’ was declared here
usec_t a, b;
^
|
| |
|
|
|
|
| |
gcc complains that len might be used unitialized, but afaict, this is not true.
|
|
|
|
|
|
|
|
| |
There isn't much difference, but in general we prefer to use the standard
functions. glibc provides reallocarray since version 2.26.
I moved explicit_bzero is configure test to the bottom, so that the two stdlib
functions are at the bottom.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
$ sudo systemd-run -p RootDirectory=/usr -E LD_LIBRARY_PATH=/lib/systemd/ -E SYSTEMD_LOG_LEVEL=debug /bin/systemd-detect-virt
Before
systemd-detect-virt[18498]: No virtualization found in DMI
systemd-detect-virt[18498]: No virtualization found in CPUID
systemd-detect-virt[18498]: Virtualization XEN not found, /proc/xen does not exist
systemd-detect-virt[18498]: This platform does not support /proc/device-tree
systemd-detect-virt[18498]: Failed to check for virtualization: No such file or directory
The first four lines are at debug level, so the user would only see that last
one usually, which is not very enlightening.
This now becomes:
systemd-detect-virt[21172]: No virtualization found in DMI
systemd-detect-virt[21172]: No virtualization found in CPUID
systemd-detect-virt[21172]: Virtualization XEN not found, /proc/xen does not exist
systemd-detect-virt[21172]: This platform does not support /proc/device-tree
systemd-detect-virt[21172]: /proc/cpuinfo not found, assuming no UML virtualization.
systemd-detect-virt[21172]: This platform does not support /proc/sysinfo
systemd-detect-virt[21172]: Found VM virtualization none
systemd-detect-virt[21172]: none
We do more checks, which is good too.
|
|
|
|
|
|
| |
Then it can be used in the asserts in logging functions without causing
infinite recursion. The error is just printed to stderr, it should be
good enough for the common case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gcc-8 throws an error if it knows snprintf might truncate output and the
return value is ignored:
../src/udev/udev-builtin-net_id.c: In function 'dev_pci_slot':
../src/udev/udev-builtin-net_id.c:297:47: error: '%s' directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=]
snprintf(str, sizeof str, "%s/%s/address", slots, dent->d_name);
^~
../src/udev/udev-builtin-net_id.c:297:17: note: 'snprintf' output between 10 and 4360 bytes into a destination of size 4096
snprintf(str, sizeof str, "%s/%s/address", slots, dent->d_name);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
Let's check all return values. This actually makes the code better, because there's
no point in trying to open a file when the name has been truncated, etc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If log_do_header() was called with overly long parameters, it'd generate
improper output. Essentially, it'd be truncated at random point, in particular
missing a newline at the end, so it'd run with the next field, usually MESSAGE=.
log_do_header is called with parameters from compiled code (file name, lien
nubmer, etc), so in practice this was unlikely to ever be a problem, but it is
possible. In particular, if systemd was compiled from sources in some deeply
nested directory (which happens for example in mock and other build roots), the
filename could be very long.
As a safety measure, let's truncate all parameters to 256 bytes. So we have
5 fields which are 256 bytes (plus the field name prefix), and a few other
fields with fixed width. This must always fit in the 2048 byte buffer.
I don't think there's much gain in calculating the required length precisely,
since it's a lot of fields and a few bytes allocated on the stack don't matter.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
log_dispatch_internal has only one caller where the extra_field/extra
params are not null: log_unit_full. When log_unit_full() was called,
when we got to log_dispatch_internal, our header would look like this:
PRIORITY=7
SYSLOG_FACILITY=3
CODE_FILE=../src/core/manager.c
CODE_LINE=2145
CODE_FUNC=manager_invoke_sigchld_event
USER_UNIT=gnome-terminal-server.service
65dffa7a3b984a6d9a46f0b8fb57710bUSER_INVOCATION_ID=
SYSLOG_IDENTIFIER=systemd
It took me a while to understand why I'm not seeing mangled messages in the
journal (after all, "" is a valid rvalue for log messages). The answer is that
journald rejects any field name which starts with a digit, and the MESSAGE_ID
that was used here starts with a digit. Hence, those lines would be silently
filtered out.
|
|
|
|
|
| |
It makes the code easier to read, because it's obvious that the function
cannot be called from elsewhere.
|
|
|
|
|
| |
The buffers are fixed size, so the message may not fit, but we don't
particularly care.
|
|
|
|
|
|
|
|
| |
This reverts commit a7419dbc59da5c8cc9e90b3d96bc947cad91ae16.
_All_ changes in that commit were wrong.
Fixes #8211.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
> logind sessions are mostly bound to the audit session concept, and audit
> sessions remain unaffected by "su", in fact they are defined to be
> "sealed off", i.e. in a way that if a process entered a session once, it
> will always stay with it, and so will its children, i.e. the only way to
> get a new session is by forking off something off PID 1 (or something
> similar) that never has been part of a session.
The code had a gap. user@.service is a special case PAM session which does
not create a logind session. Let's remember to check for it.
Fixes #8021
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds safe_atoux16 for parsing an unsigned hexadecimal 16bit int, and
uses that for parsing USB device and vendor IDs.
This fixes a compile error with gcc-8 because while we know that USB IDs are 2 bytes,
the compiler does not know that.
../src/udev/udev-builtin-hwdb.c:80:38: error: '%04X' directive output may be
truncated writing between 4 and 8 bytes into a region of size between 2 and 6
[-Werror=format-truncation=]
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Signed-off-by: Patrick Uiterwijk <puiterwijk@redhat.com>
|
|
|
|
| |
Fixes: #7239
|
|
|
|
|
|
|
| |
So far, for all our API VFS mounts we used the fstype also as mount
source, let's do that for the cgroupsv2 mounts too. The kernel doesn't
really care about the source for API VFS, but it's visible to the user,
hence let's clean this up and follow the rule we otherwise follow.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new kernel 4.15 flag permits that multiple BPF programs can be
executed for each packet processed: multiple per cgroup plus all
programs defined up the tree on all parent cgroups.
We can use this for two features:
1. Finally provide per-slice IP accounting (which was previously
unavailable)
2. Permit delegation of BPF programs to services (i.e. leaf nodes).
This patch beefs up PID1's handling of BPF to enable both.
Note two special items to keep in mind:
a. Our inner-node BPF programs (i.e. the ones we attach to slices) do
not enforce IP access lists, that's done exclsuively in the leaf-node
BPF programs. That's a good thing, since that way rules in leaf nodes
can cancel out rules further up (i.e. for example to implement a
logic of "disallow everything except httpd.service"). Inner node BPF
programs to accounting however if that's requested. This is
beneficial for performance reasons: it means in order to provide
per-slice IP accounting we don't have to add up all child unit's
data.
b. When this code is run on pre-4.15 kernel (i.e. where
BPF_F_ALLOW_MULTI is not available) we'll make IP acocunting on slice
units unavailable (i.e. revert to behaviour from before this commit).
For leaf nodes we'll fallback to non-ALLOW_MULTI mode however, which
means that BPF delegation is not available there at all, if IP
fw/acct is turned on for the unit. This is a change from earlier
behaviour, where we use the BPF_F_ALLOW_OVERRIDE flag, so that our
fw/acct would lose its effect as soon as delegation was turned on and
some client made use of that. I think the new behaviour is the safer
choice in this case, as silent bypassing of our fw rules is not
possible anymore. And if people want proper delegation then the way
out is a more modern kernel or turning off IP firewalling/acct for
the unit algother.
|
|
|
|
|
|
| |
We make heavy use of BPF functionality these days, hence expose the BPF
file system too by default now. (Note however, that we don't actually
make use bpf file systems object yet, but we might later on too.)
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an attempt to improve #8228 a bit, by extending the /run/nologin
a bit, but still keeping it somewhat brief.
On purpose I used the vague wording "unprivileged user" rather than
"non-root user" so that pam_nologin can be updated to disable its
behaviour for members of the "wheel" group one day, and our messages
would still make sense.
See #8228.
|
|
|
|
|
|
| |
This is primarily preparation for a follow-up commit that adds a common
implementation of the other side of the reboot parameter file, i.e. the
code that reads the file and issues reboot() for it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This mimics the raw_clone() call we have in place already and
establishes a new syscall wrapper raw_reboot() that wraps the kernel's
reboot() system call in a bit more low-level fashion that glibc's
reboot() wrapper. The main difference is that the extra "arg" argument
is supported.
Ultimately this just replaces the syscall wrapper implementation we
currently have at three places in our codebase by a single one.
With this change this means that all our syscall() invocations are
neatly separated out in static inline system call wrappers in our header
functions.
|
|
|
|
|
|
| |
Apparently, both __NR_ and SYS_ are useful, but we mostly use __NR_
hence use it for these two cases too, so that we settle on __NR_
exclusively.
|
|
|
|
|
|
| |
Fix a copy/paste mistake.
Fixes: #8238
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a number of occasions we use FORK_CLOSE_ALL_FDS when forking off a
child, since we don't want to pass fds to the processes spawned (either
because we later want to execve() some other process there, or because
our child might hang around for longer than expected, in which case it
shouldn't keep our fd pinned). This also closes any logging fds, and
thus means logging is turned off in the child. If we want to do proper
logging, explicitly reopen the logs hence in the child at the right
time.
This is particularly crucial in the umount/remount children we fork off
the shutdown binary, as otherwise the children can't log, which is
why #8155 is harder to debug than necessary: the log messages we
generate about failing mount() system calls aren't actually visible on
screen, as they done in the child processes where the log fds are
closed.
|
|
|
|
|
|
| |
Previously, we'd try to open kmsg on failure of the journal/syslog even
if no automatic fallback to kmsg was requested — and we wouldn't even
use the open connection afterwards...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In meson.build we check that functions are available using:
meson.get_compiler('c').has_function('foo')
which checks the following:
- if __stub_foo or __stub___foo are defined, return false
- if foo is declared (a pointer to the function can be taken), return true
- otherwise check for __builtin_memfd_create
_stub is documented by glibc as
It defines a symbol '__stub_FUNCTION' for each function
in the C library which is a stub, meaning it will fail
every time called, usually setting errno to ENOSYS.
So if __stub is defined, we know we don't want to use the glibc version, but
this doesn't tell us if the name itself is defined or not. If it _is_ defined,
and we define our replacement as an inline static function, we get an error:
In file included from ../src/basic/missing.h:1358:0,
from ../src/basic/util.h:47,
from ../src/basic/calendarspec.h:29,
from ../src/basic/calendarspec.c:34:
../src/basic/missing_syscall.h:65:19: error: static declaration of 'memfd_create' follows non-static declaration
static inline int memfd_create(const char *name, unsigned int flags) {
^~~~~~~~~~~~
.../usr/include/bits/mman-shared.h:46:5: note: previous declaration of 'memfd_create' was here
int memfd_create (const char *__name, unsigned int __flags) __THROW;
^~~~~~~~~~~~
To avoid this problem, call our inline functions different than glibc,
and use a #define to map the official name to our replacement.
Fixes #8099.
v2:
- use "missing_" as the prefix instead of "_"
v3:
- rebase and update for statx()
Unfortunately "statx" is also present in "struct statx", so the define
causes issues. Work around this by using a typedef.
I checked that systemd compiles with current glibc
(glibc-devel-2.26-24.fc27.x86_64) if HAVE_MEMFD_CREATE, HAVE_GETTID,
HAVE_PIVOT_ROOT, HAVE_SETNS, HAVE_RENAMEAT2, HAVE_KCMP, HAVE_KEYCTL,
HAVE_COPY_FILE_RANGE, HAVE_BPF, HAVE_STATX are forced to 0.
Setting HAVE_NAME_TO_HANDLE_AT to 0 causes an issue, but it's not because of
the define, but because of struct file_handle.
|
|
|
|
|
|
|
|
|
|
|
|
| |
When synthetisation is turned off, there's just too many ways those tests can
go wrong. We are not interested in verifying that the db on disk is correct,
let's just skip all checks.
In the first version of this patch, I recorded if we detected a mismatch during
configuration and only skipped tests in that case, but actually it is possible
to change the host configuration between our configuration phase and running
of the tests. It's just more robust to skip always. (This is particularly true
if tests are installed.)
|
|
|
|
|
|
| |
This makes it easier to see what is going on. Crashes may happen in a
nested test_{uid,gid}_to_name_one() function, and the default backtrace
doesn't show the actual string being tested.
|
| |
|