path: root/src/basic/string-util.c
Commit message (Collapse)AuthorAge
* Introduce free_and_strndup and use it in bus-message.cZbigniew Jędrzejewski-Szmek2018-10-29
| | | | | | | | | | | | | | | | v2: fix error in free_and_strndup() When the orignal and copied message were the same, but shorter than specified length l, memory read past the end of the buffer would be performed. A test case is included: a string that had an embedded NUL ("q\0") is used to replace "q". v3: Fix one more bug in free_and_strndup and add tests. v4: Some style fixed based on review, one more use of free_and_replace, and make the tests more comprehensive. (cherry picked from commit 7f546026abbdc56c453a577e52d57159458c3e9c)
* Prep v239: Unmask delete_chars()Sven Eden2018-08-24
* Prep v239: string-util.[hc] - Unmasked skip_leading_chars() - Newly utilized ↵Sven Eden2018-08-24
| | | | by strstrip()
* Prep v239: Uncomment header inclusions that are new or needed now.Sven Eden2018-08-24
* tree-wide: remove Lennart's copyright linesLennart Poettering2018-08-24
| | | | | | | | | | | These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.
* tree-wide: drop 'This file is part of systemd' blurbLennart Poettering2018-08-24
| | | | | | | | | | | | | | | | This part of the copyright blurb stems from the GPL use recommendations: The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.
* basic/ellipsize: do not assume the string is NUL-terminated when length is givenZbigniew Jędrzejewski-Szmek2018-08-24
| | | | | | | | | | | | | | | | | | | oss-fuzz flags this as: ==1==WARNING: MemorySanitizer: use-of-uninitialized-value 0. 0x7fce77519ca5 in ascii_is_valid systemd/src/basic/utf8.c:252:9 1. 0x7fce774d203c in ellipsize_mem systemd/src/basic/string-util.c:544:13 2. 0x7fce7730a299 in print_multiline systemd/src/shared/logs-show.c:244:37 3. 0x7fce772ffdf3 in output_short systemd/src/shared/logs-show.c:495:25 4. 0x7fce772f5a27 in show_journal_entry systemd/src/shared/logs-show.c:1077:15 5. 0x7fce772f66ad in show_journal systemd/src/shared/logs-show.c:1164:29 6. 0x4a2fa0 in LLVMFuzzerTestOneInput systemd/src/fuzz/fuzz-journal-remote.c:64:21 ... I didn't reproduce the issue, but this looks like an obvious error: the length is specified, so we shouldn't use the string with any functions for normal C-strings.
* string-util: put together strstrip() from skip_leading_chars() and ↵Lennart Poettering2018-08-24
| | | | delete_trailing_chars()
* test-ellipsize: add tests for ellipsize_mem, fix bugsZbigniew Jędrzejewski-Szmek2018-08-24
| | | | | | | | | | | | | | First, ellipsize() and ellipsize_mem() should not read past the input buffer. Those functions take an explicit length for the input data, so they should not assume that the buffer is terminated by a nul. Second, ellipsization was off in various cases where wide on multi-byte characters were used. We had some basic test for ellipsize(), but apparently it wasn't enough to catch more serious cases. Should fix
* basic/string-util: make ellipsize() inlineZbigniew Jędrzejewski-Szmek2018-08-24
| | | | | Once the redundant check is removed, it's a very simple wrapper around ellipsize_mem().
* string-util: tweak cellescape() a bitLennart Poettering2018-08-24
| | | | | | | | | | | | | For short buffer sizes cellescape() was a bit wasteful, as it might suffice to to drop a single character to find enough place for the full four byte ellipsis, if that one character was a four character escape. With this rework we'll guarantee to drop the minimum number of characters from the end to fit in the ellipsis. If the buffers we write to are large this doesn't matter much. However, if they are short (as they are when talking about the process comm field) then it starts to matter that we put as much information as we can in the space we get.
* basic/string-util: add a convenience function to cescape mostly-ascii fieldsZbigniew Jędrzejewski-Szmek2018-08-24
| | | | | | | | | It's not supposed to be the most efficient, but instead fast and simple to use. I kept the logic in ellipsize_mem() to use unicode ellipsis even in non-unicode locales. I'm not quite convinced things should be this way, especially that with this patch it'd actually be simpler to always use "…" in unicode locale and "..." otherwise, but Lennart wanted it this way for some reason.
* string-util: use fflush_and_check() where appropriateLennart Poettering2018-08-24
* string-util: teach strip_tab_ansi() to deal with CSO sequencesLennart Poettering2018-08-24
| | | | | | | | With the recent terminal_urlify() APIs we'll now sometimes generate clickable link CSO sequences. Hence we should also be able to remove them again from strings. This beefs up the logic to do so. Follow-up for: 23b27b39d2a3a002ad827a2e8a9872a51495d797
* string-util: tweak ellipsation a bitLennart Poettering2018-08-24
| | | | | | | | | | | | | | | | | | | This primarily changes to things: 1. Ellipsation to 0, 1 or 2 characters is now supported. Previously we'd hit an assert if the new lengths was < 3, this is now permitted. The result strings won't show too much info still of course, but the code becomes a bit more generic and robust to use. 2. If a UTF-8 mode is disabled and the input string is pure ASCII, then "..." is used for ellipsation, otherwise (as before) "…". This means on a pure-ASCII system we should remain pure-ASCII, matching behaviour otherwise exposed with special_glyph() and friends. Note that we'll use "…" for ellipsiation as soon as either the locale settings indicate an UTF-8 mode or the input string already contains non-ASCII unicode characters. Testing for these special cases is improved.
* tree-wide: drop license boilerplateZbigniew Jędrzejewski-Szmek2018-08-24
| | | | | | | | | | Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.
* journalctl: add highlighting for matched substringZbigniew Jędrzejewski-Szmek2018-05-30
| | | | | | | | | Red is used for highligting, the same as grep does. Except when the line is highlighted red already, because it has high priority, in which case plain ansi highlight is used for the matched substring. Coloring is implemented for short and cat outputs, and not for other types. I guess we could also add it for verbose output in the future.
* Prep v236 : Add missing SPDX-License-Identifier (2/9) src/basicSven Eden2018-03-26
* string-util: rework strextend() to optionally inset separators between each ↵Lennart Poettering2017-11-28
| | | | | | | | | | | | | | appended string This adds a new flavour of strextend(), called strextend_with_separator(), which takes an optional separator string. If specified, the separator is inserted between each appended string, as well as before the first one, but only if the original string was non-empty. This new call is particularly useful when appending new options to mount option strings and suchlike, which need to be comma-separated, and initially start out from an empty string.
* string-util: update strreplace() a bit, use GREEDY_REALLOC()Lennart Poettering2017-11-21
* string-util: when ellipsizing to a length if (size_t) -1, become a NOPLennart Poettering2017-11-10
| | | | | | | | | | Let's say that (size_t) -1 (i.e. SIZE_T_MAX) is equivalent to "unbounded" ellipsation, i.e. ellipsation as NOP. In which case the relevant functions become little more than strdup()/strndup(). This is useful to simplify caller code in case we want to turn off ellipsation in certain code paths with minimal caller-side handling for this.
* Apply missing updates from upstreamSven Eden2017-12-08
* tree-wide: use IN_SET macro (#6977)Yu Watanabe2017-12-08
* build-sys: change all HAVE_DECL_ macros to HAVE_Zbigniew Jędrzejewski-Szmek2017-11-22
| | | | | | | This is a legacy of autotools, where one detection routine used a different prefix then the others. $ git grep -e HAVE_DECL_ -l|xargs sed -i s/HAVE_DECL_/HAVE_/g
* Prep v235: Apply upstream fixes (3/10) [src/basic]Sven Eden2017-08-14
* string-util: optimize strshorten() a bitLennart Poettering2017-07-20
| | | | | There's no reason to determine the full length of the string, it's sufficient to know whether it is larger than the intended size...
* basic: cosmetic changes (#6440)Yu Watanabe2017-08-10
* basic: use _unlocked() stdio in strip_tab_ansi() (#6385)Vito Caputo2017-08-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trivial performance boost by explicitly bypassing the implicit locking of stdio. This significantly affects common cases of `journalctl` usage: Before: # time ./journalctl -b -1 > /dev/null real 0m26.628s user 0m26.495s sys 0m0.125s # time ./journalctl -b -1 > /dev/null real 0m27.069s user 0m26.936s sys 0m0.134s # time ./journalctl -b -1 > /dev/null real 0m26.727s user 0m26.607s sys 0m0.119s After: # time ./journalctl -b -1 > /dev/null real 0m23.394s user 0m23.244s sys 0m0.142s # time ./journalctl -b -1 > /dev/null real 0m23.283s user 0m23.160s sys 0m0.121s # time ./journalctl -b -1 > /dev/null real 0m23.274s user 0m23.125s sys 0m0.144s Fixes
* treewide: replace homegrown memory_erase with explicit_bzeroZbigniew Jędrzejewski-Szmek2017-07-17
| | | | | | | | explicit_bzero was added in glibc 2.25. Make use of it. explicit_bzero is hardcoded to zero the memory, so string erase now truncates the string, instead of overwriting it with 'x'. This causes a visible difference only in the journalctl case.
* tree-wide: drop NULL sentinel from strjoinZbigniew Jędrzejewski-Szmek2017-07-17
| | | | | | | | | | | | | This makes strjoin and strjoina more similar and avoids the useless final argument. spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/elogind -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libelogind/sd-bus -I ./src/libelogind/sd-event -I ./src/libelogind/sd-login -I ./src/libelogind/sd-netlink -I ./src/libelogind/sd-network -I ./src/libelogind/sd-hwdb -I ./src/libelogind/sd-device -I ./src/libelogind/sd-id128 -I ./src/libelogind-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c) git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/' This might have missed a few cases (spatch has a really hard time dealing with _cleanup_ macros), but that's no big issue, they can always be fixed later.
* Prep v232.2: Mask more unneeded functionsSven Eden2017-07-07
* tree-wide: use mfree moreZbigniew Jędrzejewski-Szmek2017-07-05
* Always use unicode ellipsis when ellipsizingZbigniew Jędrzejewski-Szmek2017-07-05
| | | | | | | | | | | | | We were already unconditionally using the unicode character when the input string was not pure ASCII, leading to different behaviour in depending on the input string. elogind[1]: Starting printit.service. python3[19962]: foooooooooooooooooooooooooooooooooooo…oooo python3[19964]: fooąęoooooooooooooooooooooooooooooooo…oooo python3[19966]: fooąęoooooooooooooooooooooooooooooooo…ąęąę python3[19968]: fooąęoooooooooooooooooąęąęąęąęąęąęąęą…ąęąę elogind[1]: Started printit.service.
* string-util: rework memory_erase() to not use GCC optimize attribute (#3812)Michael Biebl2017-07-05
| | | | | | | | | | | | | | | | | "#pragma GCC optimize" is merely a convenience to decorate multiple functions with attribute optimize. And the manual has this to say about this attribute: This attribute should be used for debugging purposes only. It is not suitable in production code. Some versions of GCC also seem to have a problem with this pragma in combination with LTO, resulting in ICEs. So use a different approach (indirect the memset call via a volatile function pointer) as implemented in openssl's crypto/mem_clr.c. Closes: #3811
* bootctl: move toupper() implementation to string-util.hLennart Poettering2017-07-05
| | | | | | We already have tolower() calls there, hence let's unify this at one place. Also, update the code to only use ASCII operations, so that we don't end up being locale dependant.
* tree-wide: make ++/-- usage consistent WRT spacingVito Caputo2017-06-16
| | | | | | Throughout the tree there's spurious use of spaces separating ++ and -- operators from their respective operands. Make ++ and -- operator consistent with the majority of existing uses; discard the spaces.
* Prep v229: Add missing fixes from upstream [1/6] src/basicSven Eden2017-05-17
* basic: add ascii_strcasecmp_nn() callLennart Poettering2017-05-17
| | | | | In contrast to ascii_strcasecmp_nn() it takes two character buffers with their individual length. It will then compare the buffers up the smaller size of the two buffers, and finally the length themselves.
* basic: add new ascii_strcasecmp_n() callLennart Poettering2017-05-17
* basic: introduce generic ascii_strlower_n() call and make use of it everywhereLennart Poettering2017-05-17
* utf8.[ch] et al: use char32_t and char16_t instead of int, int32_t, int16_tShawn Landden2017-05-17
| | | | | | | | | | | | rework C11 utf8.[ch] to use char32_t instead of uint32_t when referring to unicode chars, to make things more expressive. [ @zonque: * rebased to current master * use AC_CHECK_DECLS to detect availibility of char{16,32}_t * make utf8_encoded_to_unichar() return int ]
* Prep v228: Add remaining updates from upstream (1/3)Sven Eden2017-04-26
The util.[hc] files have been stripped of a lot of functions, that got sorted into various new files representing the type of utility. This commit adds the missing files.