diff options
Diffstat (limited to 'doc/sed.texi')
-rw-r--r-- | doc/sed.texi | 679 |
1 files changed, 652 insertions, 27 deletions
diff --git a/doc/sed.texi b/doc/sed.texi index 121e405..7cc67d6 100644 --- a/doc/sed.texi +++ b/doc/sed.texi @@ -34,7 +34,7 @@ This file documents version @value{VERSION} of @value{SSED}, a stream editor. -Copyright @copyright{} 1998-2016 Free Software Foundation, Inc. +Copyright @copyright{} 1998-2017 Free Software Foundation, Inc. @quotation Permission is granted to copy, distribute and/or modify this document @@ -484,6 +484,7 @@ $ echo | sed 'Q42' ; echo $? * Other Commands:: Less frequently used commands * Programming Commands:: Commands for @command{sed} gurus * Extended Commands:: Commands specific of @value{SSED} +* Multiple commands syntax:: Extension for easier scripting @end menu @node sed script overview @@ -586,6 +587,7 @@ thus should be terminated with newlines or be placed at the end of a @var{script} or @var{script-file}. Commands can also be preceded with optional non-significant whitespace characters. +@xref{Multiple commands syntax}. @@ -903,12 +905,12 @@ while in general flags for the @code{s} command show their effect just once. This behavior, although documented, might change in future versions. -@item w @var{file-name} +@item w @var{filename} @cindex Text, writing to a file after substitution @cindex @value{SSEDEXT}, @file{/dev/stdout} file @cindex @value{SSEDEXT}, @file{/dev/stderr} file If the substitution was made, then write out the result to the named file. -As a @value{SSED} extension, two special values of @var{file-name} are +As a @value{SSED} extension, two special values of @var{filename} are supported: @file{/dev/stderr}, which writes the result to the standard error, and @file{/dev/stdout}, which writes to the standard output.@footnote{This is equivalent to @code{p} unless the @option{-i} @@ -1149,7 +1151,7 @@ hello @codequoteundirected off @codequotebacktick off -Leading whitespaces after the @code{a} command are ignored. +Leading whitespace after the @code{a} command is ignored. The text to add is read until the end of the line. @@ -1238,7 +1240,7 @@ hello @codequoteundirected off @codequotebacktick off -Leading whitespaces after the @code{i} command are ignored. +Leading whitespace after the @code{i} command is ignored. The text to add is read until the end of the line. @item i\ @@ -1319,7 +1321,7 @@ hello @codequoteundirected off @codequotebacktick off -Leading whitespaces after the @code{c} command are ignored. +Leading whitespace after the @code{c} command is ignored. The text to add is read until the end of the line. @item c\ @@ -1464,7 +1466,7 @@ file will then be reread and inserted on each of the addressed lines. @cindex @value{SSEDEXT}, @file{/dev/stdout} file @cindex @value{SSEDEXT}, @file{/dev/stderr} file Write the pattern space to @var{filename}. -As a @value{SSED} extension, two special values of @var{file-name} are +As a @value{SSED} extension, two special values of @var{filename} are supported: @file{/dev/stderr}, which writes the result to the standard error, and @file{/dev/stdout}, which writes to the standard output.@footnote{This is equivalent to @code{p} unless the @option{-i} @@ -1706,7 +1708,283 @@ script in most multibyte locales (including UTF-8 locales). @end table +@node Multiple commands syntax +@section Multiple commands syntax + +@c POSIX says: +@c Editing commands other than {...}, a, b, c, i, r, t, w, :, and # +@c can be followed by a <semicolon>, optional <blank> characters, and +@c another editing command. However, when an s editing command is used +@c with the w flag, following it with another command in this manner +@c produces undefined results. + +There are several methods to specify multiple commands in a @command{sed} +program. + +Using newlines is most natural when running a sed script from a file +(using the @option{-f} option). + +On the command line, all @command{sed} commands may be separated by newlines. +Alternatively, you may specify each command as an argument to an @option{-e} +option: + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ seq 6 | sed '1d +3d +5d' +2 +4 +6 + +$ seq 6 | sed -e 1d -e 3d -e 5d +2 +4 +6 +@end group +@end example +@codequoteundirected off +@codequotebacktick off + +A semicolon (@samp{;}) may be used to separate most simple commands: + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ seq 6 | sed '1d;3d;5d' +2 +4 +6 +@end group +@end example +@codequoteundirected off +@codequotebacktick off + +The @code{@{},@code{@}},@code{b},@code{t},@code{T},@code{:} commands can +be separated with a semicolon (this is a non-portable @value{SSED} extension). + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ seq 4 | sed '@{1d;3d@}' +2 +4 + +$ seq 6 | sed '@{1d;3d@};5d' +2 +4 +6 +@end group +@end example +@codequoteundirected off +@codequotebacktick off + +Labels used in @code{b},@code{t},@code{T},@code{:} commands are read +until a semicolon. Leading and trailing whitespace is ignored. In +the examples below the label is @samp{x}. The first example works +with @value{SSED}. The second is a portable equivalent. For more +information about branching and labels @pxref{Branching and flow +control}. +@codequoteundirected on +@codequotebacktick on +@example +@group +$ seq 3 | sed '/1/b x ; s/^/=/ ; :x ; 3d' +1 +=2 + +$ seq 3 | sed -e '/1/bx' -e 's/^/=/' -e ':x' -e '3d' +1 +=2 +@end group +@end example +@codequoteundirected off +@codequotebacktick off + + + +@subsection Commands Requiring a newline + +The following commands cannot be separated by a semicolon and +require a newline: + +@table @asis + +@item @code{a},@code{c},@code{i} (append/change/insert) + +All characters following @code{a},@code{c},@code{i} commands are taken +as the text to append/change/insert. Using a semicolon leads to +undesirable results: + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ seq 2 | sed '1aHello ; 2d' +1 +Hello ; 2d +2 +@end group +@end example +@codequoteundirected off +@codequotebacktick off + +Separate the commands using @option{-e} or a newline: + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ seq 2 | sed -e 1aHello -e 2d +1 +Hello + +$ seq 2 | sed '1aHello +2d' +1 +Hello +@end group +@end example +@codequoteundirected off +@codequotebacktick off + +Note that specifying the text to add (@samp{Hello}) immediately +after @code{a},@code{c},@code{i} is itself a @value{SSED} extension. +A portable, POSIX-compliant alternative is: + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ seq 2 | sed '1a\ +Hello +2d' +1 +Hello +@end group +@end example +@codequoteundirected off +@codequotebacktick off + +@item @code{#} (comment) + +All characters following @samp{#} until the next newline are ignored. + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ seq 3 | sed '# this is a comment ; 2d' +1 +2 +3 + + +$ seq 3 | sed '# this is a comment +2d' +1 +3 +@end group +@end example +@codequoteundirected off +@codequotebacktick off + +@item @code{r},@code{R},@code{w},@code{W} (reading and writing files) + +The @code{r},@code{R},@code{w},@code{W} commands parse the filename +until end of the line. If whitespace, comments or semicolons are found, +they will be included in the filename, leading to unexpected results: + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ seq 2 | sed '1w hello.txt ; 2d' +1 +2 + +$ ls -log +total 4 +-rw-rw-r-- 1 2 Jan 23 23:03 hello.txt ; 2d + +$ cat 'hello.txt ; 2d' +1 +@end group +@end example +@codequoteundirected off +@codequotebacktick off + +Note that @command{sed} silently ignores read/write errors in +@code{r},@code{R},@code{w},@code{W} commands (such as missing files). +In the following example, @command{sed} tries to read a file named +@samp{@file{hello.txt ; N}}. The file is missing, and the error is silently +ignored: + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ echo x | sed '1rhello.txt ; N' +x +@end group +@end example +@codequoteundirected off +@codequotebacktick off + +@item @code{e} (command execution) + +Any characters following the @code{e} command until the end of the line +will be sent to the shell. If whitespace, comments or semicolons are found, +they will be included in the shell command, leading to unexpected results: + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ echo a | sed '1e touch foo#bar' +a + +$ ls -1 +foo#bar + +$ echo a | sed '1e touch foo ; s/a/b/' +sh: 1: s/a/b/: not found +a +@end group +@end example +@codequoteundirected off +@codequotebacktick off + + +@item @code{s///[we]} (substitute with @code{e} or @code{w} flags) + +In a substitution command, the @code{w} flag writes the substitution +result to a file, and the @code{e} flag executes the subsitution result +as a shell command. As with the @code{r/R/w/W/e} commands, these +must be terminated with a newline. If whitespace, comments or semicolons +are found, they will be included in the shell command or filename, leading to +unexpected results: + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ echo a | sed 's/a/b/w1.txt#foo' +b + +$ ls -1 +1.txt#foo +@end group +@end example +@codequoteundirected off +@codequotebacktick off + +@end table @node sed addresses @@ -3085,7 +3363,7 @@ command appends a newline and the next line to the pattern space (i.e. @samp{1}, @samp{\n}, @samp{2} in the first cycle). @item The @code{l} command prints the content of the pattern space -unambigiously. +unambiguously. @item The @code{D} command then removes the content of pattern space up to the first newline (leaving @samp{2} at the end of @@ -3173,7 +3451,303 @@ and @ref{Line length adjustment}. @node Branching and flow control @section Branching and Flow Control -TODO +The branching commands @code{b}, @code{t}, and @code{T} enable +changing the flow of @command{sed} programs. + +By default, @command{sed} reads an input line into the pattern buffer, +then continues to processes all commands in order. +Commands without addresses affect all lines. +Commands with addresses affect only matching lines. +@xref{Execution Cycle} and @ref{Addresses overview}. + +@command{sed} does not support a typical @code{if/then} construct. +Instead, some commands can be used as conditionals or to change the +default flow control: + +@table @code + +@item d +delete (clears) the current pattern space, +and restart the program cycle without processing the rest of the commands +and without printing the pattern space. + +@item D +delete the contents of the pattern space @emph{up to the first newline}, +and restart the program cycle without processing the rest of +the commands and without printing the pattern space. + +@item [addr]X +@itemx [addr]@{ X ; X ; X @} +@item /regexp/X +@item /regexp/@{ X ; X ; X @} +Addresses and regular expressions can be used as an @code{if/then} +conditional: If @var{[addr]} matches the current pattern space, +execute the command(s). +For example: The command @code{/^#/d} means: +@emph{if} the current pattern matches the regular expression @code{^#} (a line +starting with a hash), @emph{then} execute the @code{d} command: +delete the line without printing it, and restart the program cycle +immediately. + +@item b +branch unconditionally (that is: always jump to a label, skipping +or repeating other commands, without restarting a new cycle). Combined +with an address, the branch can be conditionally executed on matched +lines. + +@item t +branch conditionally (that is: jump to a label) @emph{only if} a +@code{s///} command has succeeded since the last input line was read +or another conditional branch was taken. + +@item T +similar but opposite to the @code{t} command: branch only if +there has been @emph{no} successful substitutions since the last +input line was read. +@end table + + +The following two @command{sed} programs are equivalent. The first +(contrived) example uses the @code{b} command to skip the @code{s///} +command on lines containing @samp{1}. The second example uses an +address with negation (@samp{!}) to perform substitution only on +desired lines. The @code{y///} command is still executed on all +lines: + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ printf '%s\n' a1 a2 a3 | sed -E '/1/bx ; s/a/z/ ; :x ; y/123/456/' +a4 +z5 +z6 + +$ printf '%s\n' a1 a2 a3 | sed -E '/1/!s/a/z/ ; y/123/456/' +a4 +z5 +z6 +@end group +@end example +@codequoteundirected off +@codequotebacktick off + + + +@subsection Branching and Cycles +@cindex labels +@cindex omitting labels +@cindex cycle, restarting +@cindex restarting a cycle +The @code{b},@code{t} and @code{T} commands can be followed by a label +(typically a single letter). Labels are defined with a colon followed by +one or more letters (e.g. @samp{:x}). If the label is omitted the +branch commands restart the cycle. Note the difference between +branching to a label and restarting the cycle: when a cycle is +restarted, @command{sed} first prints the current content of the +pattern space, then reads the next input line into the pattern space; +Jumping to a label (even if it is at the beginning of the program) +does not print the pattern space and does not read the next input line. + +The following program is a no-op. The @code{b} command (the only command +in the program) does not have a label, and thus simply restarts the cycle. +On each cycle, the pattern space is printed and the next input line is read: + +@example +@group +$ seq 3 | sed b +1 +2 +3 +@end group +@end example + +@cindex infinite loop, branching +@cindex branching, infinite loop +The following example is an infinite-loop - it doesn't terminate and +doesn't print anything. The @code{b} command jumps to the @samp{x} +label, and a new cycle is never started: + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ seq 3 | sed ':x ; bx' + +# The above command requires gnu sed (which supports additional +# commands following a label, without a newline). A portable equivalent: +# sed -e ':x' -e bx +@end group +@end example +@codequoteundirected off +@codequotebacktick off + +@cindex branching and n, N +@cindex n, and branching +@cindex N, and branching +Branching is often complemented with the @code{n} or @code{N} commands: +both commands read the next input line into the pattern space without waiting +for the cycle to restart. Before reading the next input line, @code{n} +prints the current pattern space then empties it, while @code{N} +appends a newline and the next input line to the pattern space. + +Consider the following two examples: + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ seq 3 | sed ':x ; n ; bx' +1 +2 +3 + +$ seq 3 | sed ':x ; N ; bx' +1 +2 +3 +@end group +@end example +@codequoteundirected off +@codequotebacktick off + +@itemize +@item +Both examples do not inf-loop, despite never starting a new cycle. + +@item +In the first example, the @code{n} commands first prints the content +of the pattern space, empties the pattern space then reads the next +input line. + +@item +In the second example, the @code{N} commands appends the next input +line to the pattern space (with a newline). Lines are accumulated in +the pattern space until there are no more input lines to read, then +the @code{N} command terminates the @command{sed} program. When the +program terminates, the end-of-cycle actions are performed, and the +entire pattern space is printed. + +@item +The second example requires @value{SSED}, +because it uses the non-POSIX-standard behavior of @code{N}. +See the ``@code{N} command on the last line'' paragraph +in @ref{Reporting Bugs}. + +@item +To further examine the difference between the two examples, +try the following commands: +@codequoteundirected on +@codequotebacktick on +@example +@group +printf '%s\n' aa bb cc dd | sed ':x ; n ; = ; bx' +printf '%s\n' aa bb cc dd | sed ':x ; N ; = ; bx' +printf '%s\n' aa bb cc dd | sed ':x ; n ; s/\n/***/ ; bx' +printf '%s\n' aa bb cc dd | sed ':x ; N ; s/\n/***/ ; bx' +@end group +@end example +@codequoteundirected off +@codequotebacktick off + +@end itemize + + + +@subsection Branching example: joining lines + +@cindex joining lines with branching +@cindex branching, joining lines +@cindex quoted-printable lines, joining +@cindex joining quoted-printable lines +@cindex t, joining lines with +@cindex b, joining lines with +@cindex b, versus t +@cindex t, versus b +As a real-world example of using branching, consider the case of +@uref{https://en.wikipedia.org/wiki/Quoted-printable,quoted-printable} files, +typically used to encode email messages. +In these files long lines are split and marked with a @dfn{soft line break} +consisting of a single @samp{=} character at the end of the line: + +@example +@group +$ cat jaques.txt +All the wor= +ld's a stag= +e, +And all the= + men and wo= +men merely = +players: +They have t= +heir exits = +and their e= +ntrances; +And one man= + in his tim= +e plays man= +y parts. +@end group +@end example + + +The following program uses an address match @samp{/=$/} as a +conditional: If the current pattern space ends with a @samp{=}, it +reads the next input line using @code{N}, replaces all @samp{=} +characters which are followed by a newline, and unconditionally +branches (@code{b}) to the beginning of the program without restarting +a new cycle. If the pattern space does not ends with @samp{=}, the +default action is performed: the pattern space is printed and a new +cycle is started: + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ sed ':x ; /=$/ @{ N ; s/=\n//g ; bx @}' jaques.txt +All the world's a stage, +And all the men and women merely players: +They have their exits and their entrances; +And one man in his time plays many parts. +@end group +@end example +@codequoteundirected off +@codequotebacktick off + +Here's an alternative program with a slightly different approach: On +all lines except the last, @code{N} appends the line to the pattern +space. A substitution command then removes soft line breaks +(@samp{=} at the end of a line, i.e. followed by a newline) by replacing +them with an empty string. +@emph{if} the substitution was successful (meaning the pattern space contained +a line which should be joined), The conditional branch command @code{t} jumps +to the beginning of the program without completing or restarting the cycle. +If the substitution failed (meaning there were no soft line breaks), +The @code{t} command will @emph{not} branch. Then, @code{P} will +print the pattern space content until the first newline, and @code{D} +will delete the pattern space content until the first new line. +(To learn more about @code{N}, @code{P} and @code{D} commands +@pxref{Multiline techniques}). + + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ sed ':x ; $!N ; s/=\n// ; tx ; P ; D' jaques.txt +All the world's a stage, +And all the men and women merely players: +They have their exits and their entrances; +And one man in his time plays many parts. +@end group +@end example +@codequoteundirected off +@codequotebacktick off + + +For more line-joining examples @pxref{Joining lines}. + @node Examples @chapter Some Sample Scripts @@ -3213,6 +3787,10 @@ Emulating standard utilities: @node Joining lines @section Joining lines +This section uses @code{N}, @code{D} and @code{P} commands to process +multiple lines, and the @code{b} and @code{t} commands for branching. +@xref{Multiline techniques} and @ref{Branching and flow control}. + Join specific lines (e.g. if lines 2 and 3 need to be joined): @codequoteundirected on @@ -3232,7 +3810,7 @@ hello @codequoteundirected off @codequotebacktick off -Join lines ending with backslashes: +Join backslash-continued lines: @codequoteundirected on @codequotebacktick on @@ -3257,6 +3835,42 @@ and another line @codequoteundirected off @codequotebacktick off +Join lines that start with whitespace (e.g SMTP headers): + +@codequoteundirected on +@codequotebacktick on +@example +@group +$ cat 2.txt +Subject: Hello + World +Content-Type: multipart/alternative; + boundary=94eb2c190cc6370f06054535da6a +Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT) +Authentication-Results: mx.gnu.org; + dkim=pass header.i=@@gnu.org; + spf=pass +Message-ID: <abcdef@@gnu.org> +From: John Doe <jdoe@@gnu.org> +To: Jane Smith <jsmith@@gnu.org> + +$ sed -E ':a ; $!N ; s/\n\s+/ / ; ta ; P ; D' 2.txt +Subject: Hello World +Content-Type: multipart/alternative; boundary=94eb2c190cc6370f06054535da6a +Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT) +Authentication-Results: mx.gnu.org; dkim=pass header.i=@@gnu.org; spf=pass +Message-ID: <abcdef@@gnu.org> +From: John Doe <jdoe@@gnu.org> +To: Jane Smith <jsmith@@gnu.org> + +# A portable (non-gnu) variation: +# sed -e :a -e '$!N;s/\n */ /;ta' -e 'P;D' +@end group +@end example +@codequoteundirected off +@codequotebacktick off + + @node Centering lines @section Centering Lines @@ -4505,24 +5119,35 @@ the size of the buffer that can be processed by certain patterns. @node Other Resources @chapter Other Resources for Learning About @command{sed} +For up to date information about @value{SSED} please +visit @uref{https://www.gnu.org/software/sed/}. + +Send general questions and suggestions to @email{sed-devel@@gnu.org}. +Visit the mailing list archives for past discussions at +@uref{https://lists.gnu.org/archive/html/sed-devel/}. + @cindex Additional reading about @command{sed} -In addition to several books that have been written about @command{sed} -(either specifically or as chapters in books which discuss -shell programming), one can find out more about @command{sed} -(including suggestions of a few books) from the FAQ -for the @code{sed-users} mailing list, available from: -@display -@uref{http://sed.sourceforge.net/sedfaq.html} -@end display - -Also of interest are -@uref{http://www.student.northpark.edu/pemente/sed/index.htm} -and @uref{http://sed.sf.net/grabbag}, -which include @command{sed} tutorials and other @command{sed}-related goodies. - -The @code{sed-users} mailing list itself maintained by Sven Guckes. -To subscribe, visit @uref{http://groups.yahoo.com} and search -for the @code{sed-users} mailing list. +The following resources provide information about @command{sed} +(both @value{SSED} and other variations). Note these not maintained by +@value{SSED} developers. + +@itemize @bullet + +@item +sed @code{$HOME}: @uref{http://sed.sf.net} + +@item +sed FAQ: @uref{http://sed.sf.net/sedfaq.html} + +@item +seder's grabbag: @uref{http://sed.sf.net/grabbag} + +@item +The @code{sed-users} mailing list maintained by Sven Guckes: +@uref{http://groups.yahoo.com/group/sed-users/} +(note this is @emph{not} the @value{SSED} mailing list). + +@end itemize @node Reporting Bugs @chapter Reporting Bugs |