summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAge
* core: add "invocation ID" concept to service managerLennart Poettering2017-07-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.
* util: use SPECIAL_ROOT_SLICE macro where appropriateLennart Poettering2017-07-05
|
* log: minor fixesLennart Poettering2017-07-05
| | | | | Most important is a fix to negate the error number if necessary, before we first access it.
* basic/fileio: we always have O_TMPFILE nowYann E. MORIN2017-07-05
| | | | | | | | | | fileio makes use of O_TMPFILE when it is available. We now always have O_TMPFILE, defined in missing.h if missing from the toolchain headers. Have fileio include missing.h and drop the guards around the use of O_TMPFILE.
* missing.h: add missing definitions for __O_TMPFILEYann E. MORIN2017-07-05
| | | | | | | | | | | | | | | | | | | Currently, a missing __O_TMPFILE was only defined for i386 and x86_64, leaving any other architectures with an "old" toolchain fail miserably at build time: src/import/export-raw.c: In function 'reflink_snapshot': src/import/export-raw.c:271:26: error: 'O_TMPFILE' undeclared (first use in this function) new_fd = open(d, O_TMPFILE|O_CLOEXEC|O_NOCTTY|O_RDWR, 0600); ^ __O_TMPFILE (and O_TMPFILE) are available since glibc 2.19. However, a lot of existing toolchains are still using glibc-2.18, and some even before that, and it is not really possible to update those toolchains. Instead of defining it only for i386 and x86_64, define __O_TMPFILE with the specific values for those archs where it is different from the generic value. Use the values as found in the Linux kernel (v4.8-rc3, current as of time of commit).
* bus-util: generalize helper for ID128 prpoertiesLennart Poettering2017-07-05
| | | | This way, we can make use of this in other code, too.
* strv: fix STRV_FOREACH_BACKWARDS() to be a single statement onlyLennart Poettering2017-07-05
| | | | | | Let's make sure people invoking STRV_FOREACH_BACKWARDS() as a single statement of an if statement don't fall into a trap, and find the tail for the list via strv_length().
* execute: move suppression of HOME=/ and SHELL=/bin/nologin into user-util.cLennart Poettering2017-07-05
| | | | | | | This adds a new call get_user_creds_clean(), which is just like get_user_creds() but returns NULL in the home/shell parameters if they contain no useful information. This code previously lived in execute.c, but by generalizing this we can reuse it in run.c.
* bus-util: turn on exit-on-disconnect for all command line toolsLennart Poettering2017-07-05
| | | | | | bus_connect_transport() is exclusively used from our command line tools, hence let's set exit-on-disconnect for all of them, making behaviour a bit nicer in case dbus-daemon goes down.
* sd-bus: optionally, exit process or event loop on disconnectLennart Poettering2017-07-05
| | | | | | | | | | | | Old libdbus has a feature that the process is terminated whenever the the bus connection receives a disconnect. This is pretty useful on desktop apps (where a disconnect indicates session termination), as well as on command line apps (where we really shouldn't stay hanging in most cases if dbus daemon goes down). Add a similar feature to sd-bus, but make it opt-in rather than opt-out, like it is on libdbus. Also, if the bus is attached to an event loop just exit the event loop rather than the the whole process.
* sd-bus: when the server-side disconnects, make sure to dispatch all tracking ↵Lennart Poettering2017-07-05
| | | | | | | objects immediately If the server side kicks us from the bus, from our view no names are on the bus anymore, hence let's make sure to dispatch all tracking objects immediately.
* sd-bus: ensure we don't dispatch track objects while we are adding names to themLennart Poettering2017-07-05
| | | | | | | | | | | In order to add a name to a bus tracking object we need to do some bus operations: we need to check if the name already exists and add match for it. Both are synchronous bus calls. While processing those we need to make sure that the tracking object is not dispatched yet, as it might still be empty, but is not going to be empty for very long. hence, block dispatching by removing the object from the dispatch queue while adding it, and readding it on error.
* sd-bus: split out handling of reply callbacks on close into its own functionLennart Poettering2017-07-05
| | | | | | | When a bus connection is closed we dispatch all reply callbacks. Do so in a new function if its own. No behaviour changes.
* terminal-util: remove unnecessary check of result of isatty() (#4000)0xAX2017-07-05
| | | | | | | | | | | | | After the call of the isatty() we check its result twice in the open_terminal(). There are no sense to check result of isatty() that it is less than zero and return -errno, because as described in documentation: isatty() returns 1 if fd is an open file descriptor referring to a terminal; otherwise 0 is returned, and errno is set to indicate the error. So it can't be less than zero.
* terminal-util: use getenv_bool for $SYSTEMD_COLORSZbigniew Jędrzejewski-Szmek2017-07-05
| | | | | | | This changes the semantics a bit: before, SYSTEMD_COLORS= would be treated as "yes", same as SYSTEMD_COLORS=xxx and SYSTEMD_COLORS=1, and only SYSTEMD_COLORS=0 would be treated as "no". Now, only valid booleans are treated as "yes". This actually matches how $SYSTEMD_COLORS was announced in NEWS.
* journald: do not create split journals for dynamic usersZbigniew Jędrzejewski-Szmek2017-07-05
| | | | | Dynamic users should be treated like system users, and their logs should end up in the main system journal.
* logind: update empty and "infinity" handling for [User]TasksMax (#3835)Tejun Heo2017-07-05
| | | | | | | | | | | | | | | | | The parsing functions for [User]TasksMax were inconsistent. Empty string and "infinity" were interpreted as no limit for TasksMax but not accepted for UserTasksMax. Update them so that they're consistent with other knobs. * Empty string indicates the default value. * "infinity" indicates no limit. While at it, replace opencoded (uint64_t) -1 with CGROUP_LIMIT_MAX in TasksMax handling. v2: Update empty string to indicate the default value as suggested by Zbigniew Jędrzejewski-Szmek. v3: Fixed empty UserTasksMax handling.
* elogind: ignore lack of tty when checking whether colors should be enabledZbigniew Jędrzejewski-Szmek2017-07-05
| | | | | | | | | | | When started by the kernel, we are connected to the console, and we'll set TERM properly to some value in fixup_environment(). We'll then enable or disable colors based on the value of $SYSTEMD_COLORS and $TERM. When reexecuting, TERM should be already set, so we can use this value. Effectively, behaviour is the same as before affd7ed1a was reverted, but instead of reopening the console before configuring color output, we just ignore what stdout is connected to and decide based on the variables only.
* hostnamectl: rework pretty hostname validation (#3985)Lennart Poettering2017-07-05
| | | | | | | | | | | Rework 17eb9a9ddba3f03fcba33445c1c1eedeb948da04 a bit. Let's make sure we don't clobber the input parameter args[1], following our coding style to not clobber parameters unless explicitly indicated. (in particular, as we don't want to have our changes appear in the command line shown in "ps"...) No functional change.
* core: cache last CPU usage counter, before destorying a cgroupLennart Poettering2017-07-05
| | | | | | | It is useful for clients to be able to read the last CPU usage counter value of a unit even if the unit is already terminated. Hence, before destroying a cgroup's cgroup cache the last CPU usage counter and return it if the cgroup is gone.
* bus-util: make sure map_basic() returns EOPNOTSUPP if called for an unknown typeLennart Poettering2017-07-05
| | | | Make sure we return proper errors for types not understood yet.
* bus-util: treat an empty string as a NULLLennart Poettering2017-07-05
| | | | | Instead of ignoring empty strings retrieved via the bus, treat them as NULL, as it's customary in elogind.
* bus-util: support mapping signed integers with bus_map_properties()Lennart Poettering2017-07-05
| | | | | | Let's make sure we can read the exit code/status properties exposed by PID 1 properly. Let's reuse the existing code for unsigned fields, as we just use it to copy words around, and don't calculate it.
* core: rename cg_unified() to cg_all_unified()Tejun Heo2017-07-05
| | | | | | | | | | | A following patch will update cgroup handling so that the elogind controller (/sys/fs/cgroup/elogind) can use the unified hierarchy even if the kernel resource controllers are on the legacy hierarchies. This would require distinguishing whether all controllers are on cgroup v2 or only the elogind controller is. In preparation, this patch renames cg_unified() to cg_all_unified(). This patch doesn't cause any functional changes.
* core: use the unified hierarchy for the elogind cgroup controller hierarchyTejun Heo2017-07-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, elogind uses either the legacy hierarchies or the unified hierarchy. When the legacy hierarchies are used, elogind uses a named legacy hierarchy mounted on /sys/fs/cgroup/elogind without any kernel controllers for process management. Due to the shortcomings in the legacy hierarchy, this involves a lot of workarounds and complexities. Because the unified hierarchy can be mounted and used in parallel to legacy hierarchies, there's no reason for elogind to use a legacy hierarchy for management even if the kernel resource controllers need to be mounted on legacy hierarchies. It can simply mount the unified hierarchy under /sys/fs/cgroup/elogind and use it without affecting other legacy hierarchies. This disables a significant amount of fragile workaround logics and would allow using features which depend on the unified hierarchy membership such bpf cgroup v2 membership test. In time, this would also allow deleting the said complexities. This patch updates elogind so that it prefers the unified hierarchy for the elogind cgroup controller hierarchy when legacy hierarchies are used for kernel resource controllers. * cg_unified(@controller) is introduced which tests whether the specific controller in on unified hierarchy and used to choose the unified hierarchy code path for process and service management when available. Kernel controller specific operations remain gated by cg_all_unified(). * "elogind.legacy_elogind_cgroup_controller" kernel argument can be used to force the use of legacy hierarchy for elogind cgroup controller. * nspawn: By default nspawn uses the same hierarchies as the host. If UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If 0, legacy for all. * nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of three options - legacy, only elogind controller on unified, and unified. The value is passed into mount setup functions and controls cgroup configuration. * nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount option is moved to mount_legacy_cgroup_hierarchy() so that it can take an appropriate action depending on the configuration of the host. v2: - CGroupUnified enum replaces open coded integer values to indicate the cgroup operation mode. - Various style updates. v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2. v4: Restored legacy container on unified host support and fixed another bug in detect_unified_cgroup_hierarchy().
* core: add Ref()/Unref() bus calls for unitsLennart Poettering2017-07-05
| | | | | | | | | | | | | | | | This adds two (privileged) bus calls Ref() and Unref() to the Unit interface. The two calls may be used by clients to pin a unit into memory, so that various runtime properties aren't flushed out by the automatic GC. This is necessary to permit clients to race-freely acquire runtime results (such as process exit status/code or accumulated CPU time) on successful service termination. Ref() and Unref() are fully recursive, hence act like the usual reference counting concept in C. Taking a reference is a privileged operation, as this allows pinning units into memory which consumes resources. Transient units may also gain a reference at the time of creation, via the new AddRef property (that is only defined for transient units at the time of creation).
* sd-bus: add a "recursive" mode to sd_bus_trackLennart Poettering2017-07-05
| | | | | | | | | | This adds an optional "recursive" counting mode to sd_bus_track. If enabled adding the same name multiple times to an sd_bus_track object is counted individually, so that it also has to be removed the same number of times before it is gone again from the tracking object. This functionality is useful for implementing local ref counted objects that peers make take references on.
* core: add cgroup CPU controller support on the unified hierarchyTejun Heo2017-07-05
| | | | | | | | | | | | | | | | | | | | | | | | | Unfortunately, due to the disagreements in the kernel development community, CPU controller cgroup v2 support has not been merged and enabling it requires applying two small out-of-tree kernel patches. The situation is explained in the following documentation. https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu While it isn't clear what will happen with CPU controller cgroup v2 support, there are critical features which are possible only on cgroup v2 such as buffered write control making cgroup v2 essential for a lot of workloads. This commit implements elogind CPU controller support on the unified hierarchy so that users who choose to deploy CPU controller cgroup v2 support can easily take advantage of it. On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max" replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config options are added with the usual compat translation. CPU quota settings remain unchanged and apply to both legacy and unified hierarchies. v2: - Error in man page corrected. - CPU config application in cgroup_context_apply() refactored. - CPU accounting now works on unified hierarchy.
* add a new tool for creating transient mount and automount unitsLennart Poettering2017-07-05
| | | | | | | | | | | | | | | | | | | | | | | This adds "elogind-mount" which is for transient mount and automount units what "elogind-run" is for transient service, scope and timer units. The tool allows establishing mounts and automounts during runtime. It is very similar to the usual /bin/mount commands, but can pull in additional dependenices on access (for example, it pulls in fsck automatically), an take benefit of the automount logic. This tool is particularly useful for mount removable file systems (such as USB sticks), as the automount logic (together with automatic unmount-on-idle), as well as automatic fsck on first access ensure that the removable file system has a high chance to remain in a fully clean state even when it is unplugged abruptly, and returns to a clean state on the next re-plug. This is a follow-up for #2471, as it adds a simple client-side for the transient automount logic added in that PR. In later work it might make sense to invoke this tool automatically from udev rules in order to implement a simpler and safer version of removable media management á la udisks.
* util-lib: unify parsing of nice level valuesLennart Poettering2017-07-05
| | | | | | | | This adds parse_nice() that parses a nice level and ensures it is in the right range, via a new nice_is_valid() helper. It then ports over a number of users to this. No functional changes.
* fileio: fix MIN/MAX mixup (#3896)Vito Caputo2017-07-05
| | | | | The intention is to clamp the value to READ_FULL_BYTES_MAX, which would be the minimum of the two.
* basic/set: remove some spurious spacesZbigniew Jędrzejewski-Szmek2017-07-05
|
* fileio: fix read_full_stream() bugs (#3887)Vito Caputo2017-07-05
| | | | | | | | | | | | | read_full_stream() _always_ allocated twice the memory needed, due to only breaking the realloc() && fread() loop when fread() returned 0, requiring another iteration and exponentially enlarged buffer just to discover the EOF condition. This also caused file sizes >2MiB && <= 4MiB to erroneously be treated as E2BIG, due to the inappropriately doubled buffer size exceeding 4*1024*1024. Also made the 4*1024*1024 magic number a READ_FULL_BYTES_MAX constant.
* util-lib: add parse_percent_unbounded() for percentages over 100% (#3886)David Michael2017-07-05
| | | | This permits CPUQuota to accept greater values as documented.
* util-lib: make timestamp generation and parsing reversible (#3869)Lennart Poettering2017-07-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves parsing and generation of timestamps and calendar specifications in two ways: - The week day is now always printed in the abbreviated English form, instead of the locale's setting. This makes sure we can always parse the week day again, even if the locale is changed. Given that we don't follow locale settings for printing timestamps in any other way either (for example, we always use 24h syntax in order to make uniform parsing possible), it only makes sense to also stick to a generic, non-localized form for the timestamp, too. - When parsing a timestamp, the local timezone (in its DST or non-DST name) may be specified, in addition to "UTC". Other timezones are still not supported however (not because we wouldn't want to, but mostly because libc offers no nice API for that). In itself this brings no new features, however it ensures that any locally formatted timestamp's timezone is also parsable again. These two changes ensure that the output of format_timestamp() may always be passed to parse_timestamp() and results in the original input. The related flavours for usec/UTC also work accordingly. Calendar specifications are extended in a similar way. The man page is updated accordingly, in particular this removes the claim that timestamps elogind prints wouldn't be parsable by elogind. They are now. The man page previously showed invalid timestamps as examples. This has been removed, as the man page shouldn't be a unit test, where such negative examples would be useful. The man page also no longer mentions the names of internal functions, such as format_timestamp_us() or UNIX error codes such as EINVAL.
* clean-ipc: debug log about all remove IPC objectsLennart Poettering2017-07-05
|
* core: add RemoveIPC= settingLennart Poettering2017-07-05
| | | | | | | | | | | | | | | | | | This adds the boolean RemoveIPC= setting to service, socket, mount and swap units (i.e. all unit types that may invoke processes). if turned on, and the unit's user/group is not root, all IPC objects of the user/group are removed when the service is shut down. The life-cycle of the IPC objects is hence bound to the unit life-cycle. This is particularly relevant for units with dynamic users, as it is essential that no objects owned by the dynamic users survive the service exiting. In fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set. In order to communicate the UID/GID of an executed process back to PID 1 this adds a new "user lookup" socket pair, that is inherited into the forked processes, and closed before the exec(). This is needed since we cannot do NSS from PID 1 due to deadlock risks, However need to know the used UID/GID in order to clean up IPC owned by it if the unit shuts down.
* virt: detect bhyve (FreeBSD hypervisor) (#3840)Leonardo Brondani Schenkel2017-07-05
| | | | | The CPUID and DMI vendor strings do not seem to be documented. Values were found experimentally and by inspecting the source code.
* clean-ipc: shorten code a bitLennart Poettering2017-07-05
|
* clean-ipc: don't filter out '.' and '..' twiceLennart Poettering2017-07-05
|
* core: inherit TERM from PID 1 for all services started on /dev/consoleLennart Poettering2017-07-05
| | | | | | | This way, invoking nspawn from a shell in the best case inherits the TERM setting all the way down into the login shell spawned in the container. Fixes: #3697
* string-util: rework memory_erase() to not use GCC optimize attribute (#3812)Michael Biebl2017-07-05
| | | | | | | | | | | | | | | | | "#pragma GCC optimize" is merely a convenience to decorate multiple functions with attribute optimize. And the manual has this to say about this attribute: This attribute should be used for debugging purposes only. It is not suitable in production code. Some versions of GCC also seem to have a problem with this pragma in combination with LTO, resulting in ICEs. So use a different approach (indirect the memset call via a volatile function pointer) as implemented in openssl's crypto/mem_clr.c. Closes: #3811
* util-lib: rework /tmp and /var/tmp handling codeLennart Poettering2017-07-05
| | | | | | | | | | | | | | | | | | | | Beef up the existing var_tmp() call, rename it to var_tmp_dir() and add a matching tmp_dir() call (the former looks for the place for /var/tmp, the latter for /tmp). Both calls check $TMPDIR, $TEMP, $TMP, following the algorithm Python3 uses. All dirs are validated before use. secure_getenv() is used in order to limite exposure in suid binaries. This also ports a couple of users over to these new APIs. The var_tmp() return parameter is changed from an allocated buffer the caller will own to a const string either pointing into environ[], or into a static const buffer. Given that environ[] is mostly considered constant (and this is exposed in the very well-known getenv() call), this should be OK behaviour and allows us to avoid memory allocations in most cases. Note that $TMPDIR and friends override both /var/tmp and /tmp usage if set.
* Add enable_disable() helperZbigniew Jędrzejewski-Szmek2017-07-05
| | | | | In this patch "enabled" and "disabled" is used exclusively, but "enable" and "disable" forms are need for the following patch.
* bootctl: move toupper() implementation to string-util.hLennart Poettering2017-07-05
| | | | | | We already have tolower() calls there, hence let's unify this at one place. Also, update the code to only use ASCII operations, so that we don't end up being locale dependant.
* core: add a concept of "dynamic" user ids, that are allocated as long as a ↵Lennart Poettering2017-07-05
| | | | | | | | | | | | | | | | | | | service is running This adds a new boolean setting DynamicUser= to service files. If set, a new user will be allocated dynamically when the unit is started, and released when it is stopped. The user ID is allocated from the range 61184..65519. The user will not be added to /etc/passwd (but an NSS module to be added later should make it show up in getent passwd). For now, care should be taken that the service writes no files to disk, since this might result in files owned by UIDs that might get assigned dynamically to a different service later on. Later patches will tighten sandboxing in order to ensure that this cannot happen, except for a few selected directories. A simple way to test this is: elogind-run -p DynamicUser=1 /bin/sleep 99999
* sysusers: move various user credential validity checks to src/basic/Lennart Poettering2017-07-05
| | | | | | | This way we can reuse them for validating User=/Group= settings in unit files (to be added in a later commit). Also, add some tests for them.
* core: introduce MemorySwapMax=WaLyong Cho2017-07-05
| | | | | Similar to MemoryMax=, MemorySwapMax= limits swap usage. This controls controls "memory.swap.max" attribute in unified cgroup.
* cgroup: detect cgroup namespacesChristian Brauner2017-07-05
| | | | | - define CLONE_NEWCGROUP - add fun to detect whether cgroup namespaces are supported
* Revert "build-sys: hide magic section variables from exported symbols"David Herrmann2017-07-05
| | | | | | | This reverts commit aac7c5ed8bc6ffaba417b9c0b87bcf342865431b. This visibility bug originated in ld.gold and has been fixed upstream: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=5417c94d1a944d1a27f99240e5d62a6d7cd324f1