| Commit message (Collapse) | Author | Age |
|
|
|
| |
It's a fairly specialized function. Let's make new files for it and the tests.
|
|
|
|
|
| |
Add a comment about the return value and rename r to ans. r is
nowadays reserved for the integer return value, and char *r is confusing.
|
|
|
|
|
| |
If it writes to memory, it's not pure, by definition.
Fixup for 882ac6e769c5c.
|
|
|
|
|
|
|
|
| |
It also used __bitwise and __force. It seems easier to rename
our versions since they are local to this one single header.
Also, undefine them afteerwards, so that we don't pollute the
preprocessor macro namespace.
|
|
|
|
|
|
|
|
|
|
| |
Ubuntu 14.04 (Trusty) kernel header packages ship without
<linux/vm_sockets.h>. Only struct sockaddr_vm and VMADDR_CID_ANY will
be needed by elogind and they are simple enough to go in missing.h.
CentOS 7 <sys/socket.h> does not define AF_VSOCK. Define it so the code
can compile although actual socket(2) calls may fail at runtime if the
address family isn't available.
|
|
|
|
|
|
|
| |
gperf-3.1 generates lookup functions that take a size_t length
parameter instead of unsigned int. Test for this at configure time.
Fixes: https://github.com/elogind/elogind/issues/5039
|
| |
|
|
|
|
|
|
|
| |
If a callback of an event source returns an error, then the event source
might already be half-destroyed, if the callback dropped all refs.
Hence, don't assume that the type is still valid, and save it before we
issue the callback.
|
|
|
|
| |
Also, add tests to make sure this actually works as intended.
|
|
|
|
|
|
|
|
| |
The AF_VSOCK address family facilitates guest<->host communication on
VMware and KVM (virtio-vsock). Adding support to elogind allows guest
agents to be launched through .socket unit files. Today guest agents
are stand-alone daemons running inside guests that do not take advantage
of elogind socket activation.
|
|
|
|
|
|
|
|
| |
sockaddr_port() either returns a >= 0 port number or a negative errno.
This works for AF_INET and AF_INET6 because port ranges are only 16-bit.
In AF_VSOCK ports are 32-bit so an int cannot represent all port number
and negative errnos. Separate the port and the return code.
|
| |
|
|
|
|
|
| |
If a hex string has an uneven length, generate an error instead of
silently assuming a trailing '0' was in place.
|
|
|
|
| |
Fixes #1188.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves kernel command line parsing in a number of ways:
a) An kernel option "foo_bar=xyz" is now considered equivalent to
"foo-bar-xyz", i.e. when comparing kernel command line option names "-" and
"_" are now considered equivalent (this only applies to the option names
though, not the option values!). Most of our kernel options used "-" as word
separator in kernel command line options so far, but some used "_". With
this change, which was a source of confusion for users (well, at least of
one user: myself, I just couldn't remember that it's elogind.debug-shell,
not elogind.debug_shell). Considering both as equivalent is inspired how
modern kernel module loading normalizes all kernel module names to use
underscores now too.
b) All options previously using a dash for separating words in kernel command
line options now use an underscore instead, in all documentation and in
code. Since a) has been implemented this should not create any compatibility
problems, but normalizes our documentation and our code.
c) All kernel command line options which take booleans (or are boolean-like)
have been reworked so that "foobar" (without argument) is now equivalent to
"foobar=1" (but not "foobar=0"), thus normalizing the handling of our
boolean arguments. Specifically this means elogind.debug-shell and
elogind_debug_shell=1 are now entirely equivalent.
d) All kernel command line options which take an argument, and where no
argument is specified will now result in a log message. e.g. passing just
"elogind.unit" will no result in a complain that it needs an argument. This
is implemented in the proc_cmdline_missing_value() function.
e) There's now a call proc_cmdline_get_bool() similar to proc_cmdline_get_key()
that parses booleans (following the logic explained in c).
f) The proc_cmdline_parse() call's boolean argument has been replaced by a new
flags argument that takes a common set of bits with proc_cmdline_get_key().
g) All kernel command line APIs now begin with the same "proc_cmdline_" prefix.
h) There are now tests for much of this. Yay!
|
|
|
|
|
|
| |
if we want to parse the kernel command line, let's check the
$SYSTEMD_PROC_CMDLINE environment variable first. This is useful for debugging
purposes.
|
|
|
|
|
|
|
|
|
|
|
| |
elogind.journal-fields(7) documents CODE_FUNC=. Internally, we were
inconsistent: sd_journal_print uses CODE_FUNC=, log.h has CODE_FUNCTION=,
python-elogind and bootchart also used CODE_FUNC=, when they were internal.
Most external projects use sd_journal_* functions, so CODE_FUNC=,
python-elogind still uses CODE_FUNC=, as does elogind-bootchart, and
independent reimplementations in golang-github-coreos-go-elogind, qtbase,
network manager, glib, pulseaudio. Hence, I don't think there's much
choice.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Those square brackets don't fit how our other messages look like; we use colons
everywhere else. The "[a:b]" format was originally added in
ed5bcfbe3c3b68e59242c03649eea03a9707d318, and remained unchanged for 7 years,
but in the meantime other conventions evolved.
The new version is also one character shorter.
[/etc/elogind/system/elogind-networkd.service.d/override.conf:2] Failed to parse sec value, ignoring: ...
↓
/etc/elogind/system/elogind-networkd.service.d/override.conf:2: Failed to parse sec value, ignoring: ...
|
|
|
|
| |
We can take advantage of the fact a NULL argument terminates the list.
|
|
|
|
|
|
|
|
|
|
| |
Our warning message was misleading, because we wouldn't "correct" anything,
we'd just ignore unkown escapes. Update the message.
Also, print just the extracted word (which contains the offending sequences) in
the message, instead of the whole line.
Fixes #4697.
|
|
|
|
| |
Let's print a proper message if we see MS_MOVE.
|
| |
|
|
|
|
| |
As simple wrapper around fd_is_temporary_fs().
|
|
|
|
| |
Also, O_NOCTTY is a safer bet, let's add that too.
|
|
|
|
|
|
| |
Let's use chase_symlinks() when looking for /etc/os-release and
/usr/lib/os-release as these files might be symlinks (and actually are IRL on
some distros).
|
|
|
|
|
|
| |
Let's permit invoking chase_symlinks() with a NULL return parameter. If so, the
resolved name is not returned, and call is useful for checking for existance of
a file, without actually returning its ultimate path.
|
|
|
|
| |
containers
|
|
|
|
|
|
| |
We want that elogind --user gets its own keyring as usual, even if the
barebones PAM snippet we ship upstream is used. If we don't do this we get the
basic keyring elogind --system sets up for us.
|
|
|
|
|
|
|
|
|
| |
PR_SET_MM_ARG_START allows us to relatively cleanly implement process renaming.
However, it's only available with privileges. Hence, let's try to make use of
it, and if we can't fall back to the traditional way of overriding argv[0].
This removes size restrictions on the process name shown in argv[] at least for
privileged processes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, elogind-detect-virt was unable to detect "elogind-nspawn -a"
container environments, i.e. where PID 1 is a stub process running in host
context, as in that case /proc/1/environ was inherited from the host. Let's
improve that, and add an additional check for container environments where
/proc/1/environ is not cleaned up and does not contain the $container
environment variable:
The /proc/1/sched file shows the host PID in the first line. if this is not
1, we know we are running in a PID namespace (but not which implementation).
With these changes we should be able to detect container environments that
don't set $container at all.
|
|
|
|
|
|
| |
1. Listed in TODO.
2. Tree wide replace safe_atou16 with parse_ip_port incase
it's used for ports.
|
|
|
|
|
|
|
|
|
| |
Let's accept "µs" as alternative time unit for microseconds. We already accept
"us" and "usec" for them, lets extend on this and accept the proper scientific
unit specification too.
We will never output this as time unit, but it's fine to accept it, after all
we are pretty permissive with time units already.
|
|
|
|
|
| |
This means that callers can distiguish an error from flags==0,
and don't have to special-case the empty string.
|
|
|
|
| |
DEFINE_TRIVIAL_CLEANUP_FUNC() already does that check, no need to duplicate it.
|
|
|
|
| |
Follow-up to #4687 and e7330dfe14b1965f.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let's store the invocation ID in the per-service keyring as a root-owned key,
with strict access rights. This has the advantage over the environment-based ID
passing that it also works from SUID binaries (as they key cannot be overidden
by unprivileged code starting them), in contrast to the secure_getenv() based
mode.
The invocation ID is now passed in three different ways to a service:
- As environment variable $INVOCATION_ID. This is easy to use, but may be
overriden by unprivileged code (which might be a bad or a good thing), which
means it's incompatible with SUID code (see above).
- As extended attribute on the service cgroup. This cannot be overriden by
unprivileged code, and may be queried safely from "outside" of a service.
However, it is incompatible with containers right now, as unprivileged
containers generally cannot set xattrs on cgroupfs.
- As "invocation_id" key in the kernel keyring. This has the benefit that the
key cannot be changed by unprivileged service code, and thus is safe to
access from SUID code (see above). But do note that service code can replace
the session keyring with a fresh one that lacks the key. However in that case
the key will not be owned by root, which is easily detectable. The keyring is
also incompatible with containers right now, as it is not properly namespace
aware (but this is being worked on), and thus most container managers mask
the keyring-related system calls.
Ideally we'd only have one way to pass the invocation ID, but the different
ways all have limitations. The invocation ID hookup in journald is currently
only available on the host but not in containers, due to the mentioned
limitations.
How to verify the new invocation ID in the keyring:
# elogind-run -t /bin/sh
Running as unit: run-rd917366c04f847b480d486017f7239d6.service
Press ^] three times within 1s to disconnect TTY.
# keyctl show
Session Keyring
680208392 --alswrv 0 0 keyring: _ses
250926536 ----s-rv 0 0 \_ user: invocation_id
# keyctl request user invocation_id
250926536
# keyctl read 250926536
16 bytes of data in key:
9c96317c ac64495a a42b9cd7 4f3ff96b
# echo $INVOCATION_ID
9c96317cac64495aa42b9cd74f3ff96b
# ^D
This creates a new transient service runnint a shell. Then verifies the
contents of the keyring, requests the invocation ID key, and reads its payload.
For comparison the invocation ID as passed via the environment variable is also
displayed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch ensures that each system service gets its own session kernel keyring
automatically, and implicitly. Without this a keyring is allocated for it
on-demand, but is then linked with the user's kernel keyring, which is OK
behaviour for logged in users, but not so much for system services.
With this change each service gets a session keyring that is specific to the
service and ceases to exist when the service is shut down. The session keyring
is not linked up with the user keyring and keys hence only search within the
session boundaries by default.
(This is useful in a later commit to store per-service material in the keyring,
for example the invocation ID)
(With input from David Howells)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_cleanup_
This adds mkdtemp_malloc() that is a combination of mkdtemp() plus strdup(). It
initializes its return paremeter only if the temporary directory could be
created successfully, so that the parameter is exactly non-NULL when the
directory exists.
rmdir_and_free() and rmdir_and_freep() are also added, and the latter may be
used inside of _cleanup_ for such a directory string variable, to automatically
rmdir() the directory if it is non-NULL when the scope exits.
rmdir_and_free() is similar to the existing rm_rf_and_free() however, is only
removes a single directory and does not operate recursively.
|
|
|
|
| |
As suggested by @keszybz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a concept of "extrinsic" mounts. If mounts are extrinsic we consider
them managed by something else and do not add automatic ordering against
umount.target, local-fs.target, remote-fs.target.
Extrinsic mounts are considered:
- All mounts if we are running in --user mode
- API mounts such as everything below /proc, /sys, /dev, which exist from
earliest boot to latest shutdown.
- All mounts marked as initrd mounts, if we run on the host
- The initrd's private directory /run/initrams that should survive until last
reboot.
This primarily merges a couple of different exclusion lists into a single
concept.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So far elogind-nspawn container has been creating files under
/run/elogind/inaccessible, no matter whether it's running in user
namespace or not. That's fine for regular files, dirs, socks, fifos.
However, it's not for block and character devices, because kernel
doesn't allow them to be created under user namespace. It results
in warnings at booting like that:
====
Couldn't stat device /run/elogind/inaccessible/chr
Couldn't stat device /run/elogind/inaccessible/blk
====
Thus we need to have the cgroups whitelisting handler to silently ignore
a file, when the device path is prefixed with "-". That's exactly the
same convention used in directives like ReadOnlyPaths=. Also insert the
prefix "-" to inaccessible entries.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new flag controls whether to consider a problem if the referenced path
doesn't actually exist. If specified it's OK if the final file doesn't exist.
Note that this permits one or more final components of the path not to exist,
but these must not contain "../" for safety reasons (or, to be extra safe,
neither "./" and a couple of others, i.e. what path_is_safe() permits).
This new flag is useful when resolving paths before issuing an mkdir() or
open(O_CREAT) on a path, as it permits that the file or directory is created
later.
The return code of chase_symlinks() is changed to return 1 if the file exists,
and 0 if it doesn't. The latter is only returned in case CHASE_NON_EXISTING is
set.
|
|
|
|
|
|
| |
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to
chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of
chase_symlinks_prefix().
|
|
|
|
|
|
|
|
|
| |
Previously, we'd generate an EINVAL error if it is attempted to escape a root
directory with relative ".." symlinks. With this commit this is changed so that
".." from the root directory is a NOP, following the kernel's own behaviour
where /.. is equivalent to /.
As suggested by @keszybz.
|
|
|
|
|
|
|
|
| |
chase_symlinks() currently expects a fully qualified, absolute path, relative
to the host's root as first argument. Which is useful in many ways, and similar
to the paths unlink(), rename(), open(), … expect. Sometimes it's however
useful to first prefix the specified path with the specified root directory.
Add a new call chase_symlinks_prefix() for this, that is a simple wrapper.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let's be a bit more careful when detecting chroot() environments, so that we
can discern them from namespaced environments.
Previously this would simply check if the root directory of PID 1 matches our
own root directory. With this commit, we also check whether the namespaces of
PID 1 and ourselves are the same. If not we assume we are running inside of a
namespaced environment instead of a chroot() environment.
This has the benefit that systemctl (which uses running_in_chroot()) will work
as usual when invoked in a namespaced service.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v232's cgroup hybrid mode mounted v2 on /sys/fs/cgroup/elogind, which
unfortunately broke other tools which expect v1 there. From v233 on, hybrid
mode instead mounts and uses v2 on /sys/fs/cgroup/unified and keeps
/sys/fs/cgroup/elogind on v1 for compatibility with external tools. However,
to keep elogind live upgrades working, v233+ should be able to recognize v232
layout and keep using it.
This patch adds v232 hybrid mode support. If v232 layout is detected,
cg_unified(SYSTEMD_CGRouP_CONTROLLER) keeps returning %true but
cg_hybrid_unified() returns %false. This keeps process management on cgroup v2
but turns off the parallel layout.
|