| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new safe_fork() wrapper around fork() and makes use of it
everywhere. The new wrapper does a couple of things we previously did
manually and separately in a safer, more correct and automatic way:
1. Optionally resets signal handlers/mask in the child
2. Sets a name on all processes we fork off right after forking off (and
the patch assigns useful names for all processes we fork off now,
following a systematic naming scheme: always enclosed in () – in order
to indicate that these are not proper, exec()ed processes, but only
forked off children, and if the process is long-running with only our
own code, without execve()'ing something else, it gets am "sd-" prefix.)
3. Optionally closes all file descriptors in the child
4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe
way so that the parent dying before this happens being handled
safely.
5. Optionally reopens the logs
6. Optionally connects stdin/stdout/stderr to /dev/null
7. Debug logs about the forked off processes.
|
|
|
|
|
|
|
|
|
|
| |
Ultimately, O_CLOEXEC should be off in fd 0, 1, 2, but when we open
/dev/null here it's unlikely to be < 0, and after dupping the fd to 0,
1, 2 we turn off O_CLOEXEC explicitly anyway.
Unless we know that what we are about to open will return 0, 1 or 2 we
should always set O_CLOEXEC in order to be safe to other threads forking
of subprocesses at the wrong moment.
|
|
|
|
|
|
|
|
| |
Just a minor tweak, making sure we execute as much as we can of the
funciton, but return the first error instead of the last we encounter.
This is usuelly how we do things when we have functions that continue on
the first error, so let's do it like that here too.
|
|
|
|
|
| |
Our own calls return errors in their return values, hence use that
rather than errno when checking errors.
|
| |
|
| |
|
|
|
|
| |
It just seems strange to have it in a different file if mkdir-label.c exists.
|
| |
|
|
|
|
|
| |
We'd pass pointers to mkdir and mkdir_label to call in various places. mkdir
returns the error in errno while mkdir_label returns the error directly.
|
|
|
|
| |
And use them where they can be applicable.
|
| |
|
|
|
|
|
| |
This works supports to configure L3S mode and flags
such as bridge, private and vepa
|
|
|
|
| |
In preparation for future changes.
|
|
|
|
| |
Also include missing.h in dissect-image.c to pick it up.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
(#7645)
This makes things a bit easier to read I think, and also makes sure we
always use the _unlikely_ wrapper around it, which so far we used
sometimes and other times we didn't. Let's clean that up.
|
|
|
|
|
|
| |
Followup to previous commit. Suggested by @poettering.
Reindented the `verbs[]` tables to match the apparent previous
whitespace rules (indent to one flag, allow multiple flags to overflow?).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A lot of code references the `running_in_chroot()` function; while
I didn't dig I'm pretty certain this arose to deal with situations
like RPM package builds in `mock` - there we don't want the `%post`s
to `systemctl start` for example.
And actually this exact same use case arises for
[rpm-ostree](https://github.com/projectatomic/rpm-ostree/)
where we implement offline upgrades by default; the `%post`s are
always run in a new chroot using [bwrap](https://github.com/projectatomic/bubblewrap).
And here's the problem: bwrap creates proper mount roots, so it
passes `running_in_chroot()`, and then if a script tries to do
`systemctl start` we get:
`System has not been booted with systemd as init system (PID 1)`
but that's an *error*, unlike the `running_in_chroot()` case where we ignore.
Further complicating things is there are real world RPM packages
like `glusterfs` which end up invoking `systemctl start`.
A while ago, the `SYSTEMD_IGNORE_CHROOT` environment variable was
added for the inverse case of running in a chroot, but still wanting
to use systemd as PID 1 (presumably some broken initramfs setups?).
Let's introduce a `SYSTEMD_OFFLINE` environment variable for cases like
mock/rpm-ostree so we can force on the "ignore everything except preset" logic.
This way we'll still not start services even if mock switches to use nspawn or
bwrap or something else that isn't a chroot.
We also cleanly supercede the `SYSTEMD_IGNORE_CHROOT=1` which is now spelled
`SYSTEMD_OFFLINE=0`. (Suggested by @poettering)
Also I made things slightly nicer here and we now print the ignored operation.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remount, and subsequent umount, attempts can hang for inaccessible network
based mount points. This can leave a system in a hard hang state that
requires a hard reset in order to recover. This change moves the remount,
and umount attempts into separate child processes. The remount and umount
operations will block for up to 90 seconds (DEFAULT_TIMEOUT_USEC). Should
those waits fail, the parent will issue a SIGKILL to the child and continue
with the shutdown efforts.
In addition, instead of only reporting some additional errors on the final
attempt, failures are reported as they occur.
|
|
|
|
|
|
| |
Let's employ coccinelle to do this for us.
Follow-up for #7625.
|
|
|
|
|
|
| |
These helper calls are potentially called often, and allocate FILE*
objects internally for a very short period of time, let's turn off
locking for them too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In meson.build we check that functions are available using:
meson.get_compiler('c').has_function('foo')
which checks the following:
- if __stub_foo or __stub___foo are defined, return false
- if foo is declared (a pointer to the function can be taken), return true
- otherwise check for __builtin_memfd_create
_stub is documented by glibc as
It defines a symbol '__stub_FUNCTION' for each function
in the C library which is a stub, meaning it will fail
every time called, usually setting errno to ENOSYS.
So if __stub is defined, we know we don't want to use the glibc version, but
this doesn't tell us if the name itself is defined or not. If it _is_ defined,
and we define our replacement as an inline static function, we get an error:
In file included from ../src/basic/missing.h:1358:0,
from ../src/basic/util.h:47,
from ../src/basic/calendarspec.h:29,
from ../src/basic/calendarspec.c:34:
../src/basic/missing_syscall.h:65:19: error: static declaration of 'memfd_create' follows non-static declaration
static inline int memfd_create(const char *name, unsigned int flags) {
^~~~~~~~~~~~
.../usr/include/bits/mman-shared.h:46:5: note: previous declaration of 'memfd_create' was here
int memfd_create (const char *__name, unsigned int __flags) __THROW;
^~~~~~~~~~~~
To avoid this problem, call our inline functions different than glibc,
and use a #define to map the official name to our replacement.
Fixes #8099.
v2:
- use "missing_" as the prefix instead of "_"
v3:
- rebase and update for statx()
Unfortunately "statx" is also present in "struct statx", so the define
causes issues. Work around this by using a typedef.
I checked that systemd compiles with current glibc
(glibc-devel-2.26-24.fc27.x86_64) if HAVE_MEMFD_CREATE, HAVE_GETTID,
HAVE_PIVOT_ROOT, HAVE_SETNS, HAVE_RENAMEAT2, HAVE_KCMP, HAVE_KEYCTL,
HAVE_COPY_FILE_RANGE, HAVE_BPF, HAVE_STATX are forced to 0.
Setting HAVE_NAME_TO_HANDLE_AT to 0 causes an issue, but it's not because of
the define, but because of struct file_handle.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
fputs() writes only first 2048 bytes and fails
to write to /proc when values are larger than that.
This patch adds a new flag to WriteStringFileFlags
that make it possible to disable the buffer under
specific cases.
|
|
|
|
|
|
|
|
|
|
| |
Using strlen() to declare a buffer results in a variable-length array,
even if the compiler likely optimizes it to be a compile time constant.
When building with -Wvla, certain versions of gcc complain about such
buffers. Compiling with -Wvla has the advantage of preventing variably
length array, which defeat static asserts that are implemented by
declaring an array of negative length.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
expression
While the compiler likely optimizes strlen(x) for string literals,
it is not a constant expression.
Hence,
char buffer[strlen("OPTION_000") + 1];
declares a variable-length array. STRLEN() can be used instead
when a constant espression is needed.
It's not entirely identical to strlen(), as STRLEN("a\0") counts 2.
Also, it only works with string literals and the macro enforces
that the argument is a literal.
|
|
|
|
|
| |
Given that we regularly have verbs that require privileges, let's just
make this a flag of the verb.
|
|
|
|
|
|
|
| |
Our CODING_STYLE suggests not comparing with NULL, but relying on C's
downgrade-to-bool feature for that. Fix up some code to match these
guidelines. (This is not comprehensive, the coccinelle output for this
is unfortunately kinda borked)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The detection of ConditionVirtualisation= relies on the presence of
/proc/xen/capabilities. If the file exists and contains the string
"control_d", the running system is a dom0 and VIRTUALIZATION_NONE should
be set. In case /proc/xen exists, or some sysfs files indicate "xen",
VIRTUALIZATION_XEN should be set to indicate the system is a domU.
With an (old) xenlinux based kernel, /proc/xen/capabilities is always
available and the detection described above works always. But with a
pvops based kernel, xenfs must be mounted on /proc/xen to get
"capabilities". This is done by a proc-xen.mount unit, which is part of
xen.git. Since the mounting happens "late", other units may be scheduled
before "proc-xen.mount". If these other units make use of
"ConditionVirtualisation=", the virtualization detection returns
incorect results. detect_vm() will set VIRTUALIZATION_XEN because "xen"
is found in sysfs. This value will be cached. Once xenfs is mounted, the
next process that runs detect_vm() will get VIRTUALIZATION_NONE.
This misdetection can be fixed by using
/sys/hypervisor/properties/features, which exports the value returned by
the "XENVER_get_features" hypercall. If the bit XENFEAT_dom0 is set, the
domain is the "hardware domain". It is supposed to have permissions to
access all hardware. The used sysfs file is available since v2.6.31.
The commonly used term "dom0" refers to the control domain which runs
the toolstack and has access to all hardware. But the virtualization
host may be configured such that one dedicated domain becomes the
"hardware domain", and another one the "toolstack domain".
|
|
|
|
|
|
| |
Update detect_vm_xen_dom0 to propagate errors in case reading
/proc/xen/capabilites fails. This does not fix any bugs, it just makes
it consistent with other functions called by detect_vm.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The file /proc/xen/capabilities is only available if xenfs is mounted.
With a classic xenlinux based kernel that file is available
unconditionally. But with a modern pvops based kernel, xenfs must be
mounted before the "capabilities" may appear. xenfs is mounted very late
via .services files provided by the Xen toolstack. Other units may be
scheduled before xenfs is mounted, which will confuse the detection of
VIRTUALIZATION_XEN.
In all Xen enabled kernels, and if that kernel is actually running on
the Xen hypervisor, the "/proc/xen" directory is the reliable indicator
that this instance runs in a "Xen guest".
Adjust the code to check for /proc/xen instead of
/proc/xen/capabilities.
Fixes commit 3f61278b5 ("basic: Bugfix Detect XEN Dom0 as no virtualization")
|
|
|
|
|
| |
We use it all over the place, let's add a #define for it. Makes things
easier greppable, and more explanatory I think.
|
|
|
|
|
|
| |
Instead of contacting PID 1 for dynamic UID/GID lookups for all
UIDs/GIDs that do not qualify as "system" do the more precise check
instead: check if they actually qualify for the "dynamic" range.
|
|
|
|
|
|
|
| |
This adds uid_is_system() and gid_is_system(), similar in style to
uid_is_dynamic(). That a helper like this is useful is illustrated by
the fact that test-condition.c didn't get the check right so far, which
this patch fixes.
|
|
|
|
| |
Also, export these ranges in our pkg-config files.
|
|
|
|
|
| |
Since we're munging the array anyway, we can make the output a bit
nicer too.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
appended string
This adds a new flavour of strextend(), called
strextend_with_separator(), which takes an optional separator string. If
specified, the separator is inserted between each appended string, as
well as before the first one, but only if the original string was
non-empty.
This new call is particularly useful when appending new options to mount
option strings and suchlike, which need to be comma-separated, and
initially start out from an empty string.
|
|
|
|
|
|
|
|
|
|
| |
automatically (#7522)
Let's optimize things a bit, and instead of having to strip whitespace
first before decoding base64, let's do that implicitly while doing so.
Given that base64 was designed the way it was designed specifically to
be tolerant to whitespace changes, it's a good idea to do this
automatically and implicitly.
|
| |
|
|
|
|
|
| |
Before this, chase_symlinks("/../../foo/bar",...) returns //foo/bar.
This removes the unnecessary slash at the head.
|
| |
|