summaryrefslogtreecommitdiff
path: root/doc/sed.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/sed.texi')
-rw-r--r--doc/sed.texi679
1 files changed, 652 insertions, 27 deletions
diff --git a/doc/sed.texi b/doc/sed.texi
index 121e405..7cc67d6 100644
--- a/doc/sed.texi
+++ b/doc/sed.texi
@@ -34,7 +34,7 @@
This file documents version @value{VERSION} of
@value{SSED}, a stream editor.
-Copyright @copyright{} 1998-2016 Free Software Foundation, Inc.
+Copyright @copyright{} 1998-2017 Free Software Foundation, Inc.
@quotation
Permission is granted to copy, distribute and/or modify this document
@@ -484,6 +484,7 @@ $ echo | sed 'Q42' ; echo $?
* Other Commands:: Less frequently used commands
* Programming Commands:: Commands for @command{sed} gurus
* Extended Commands:: Commands specific of @value{SSED}
+* Multiple commands syntax:: Extension for easier scripting
@end menu
@node sed script overview
@@ -586,6 +587,7 @@ thus should be terminated
with newlines or be placed at the end of a @var{script} or @var{script-file}.
Commands can also be preceded with optional non-significant
whitespace characters.
+@xref{Multiple commands syntax}.
@@ -903,12 +905,12 @@ while in general flags for the @code{s} command show their
effect just once. This behavior, although documented, might
change in future versions.
-@item w @var{file-name}
+@item w @var{filename}
@cindex Text, writing to a file after substitution
@cindex @value{SSEDEXT}, @file{/dev/stdout} file
@cindex @value{SSEDEXT}, @file{/dev/stderr} file
If the substitution was made, then write out the result to the named file.
-As a @value{SSED} extension, two special values of @var{file-name} are
+As a @value{SSED} extension, two special values of @var{filename} are
supported: @file{/dev/stderr}, which writes the result to the standard
error, and @file{/dev/stdout}, which writes to the standard
output.@footnote{This is equivalent to @code{p} unless the @option{-i}
@@ -1149,7 +1151,7 @@ hello
@codequoteundirected off
@codequotebacktick off
-Leading whitespaces after the @code{a} command are ignored.
+Leading whitespace after the @code{a} command is ignored.
The text to add is read until the end of the line.
@@ -1238,7 +1240,7 @@ hello
@codequoteundirected off
@codequotebacktick off
-Leading whitespaces after the @code{i} command are ignored.
+Leading whitespace after the @code{i} command is ignored.
The text to add is read until the end of the line.
@item i\
@@ -1319,7 +1321,7 @@ hello
@codequoteundirected off
@codequotebacktick off
-Leading whitespaces after the @code{c} command are ignored.
+Leading whitespace after the @code{c} command is ignored.
The text to add is read until the end of the line.
@item c\
@@ -1464,7 +1466,7 @@ file will then be reread and inserted on each of the addressed lines.
@cindex @value{SSEDEXT}, @file{/dev/stdout} file
@cindex @value{SSEDEXT}, @file{/dev/stderr} file
Write the pattern space to @var{filename}.
-As a @value{SSED} extension, two special values of @var{file-name} are
+As a @value{SSED} extension, two special values of @var{filename} are
supported: @file{/dev/stderr}, which writes the result to the standard
error, and @file{/dev/stdout}, which writes to the standard
output.@footnote{This is equivalent to @code{p} unless the @option{-i}
@@ -1706,7 +1708,283 @@ script in most multibyte locales (including UTF-8 locales).
@end table
+@node Multiple commands syntax
+@section Multiple commands syntax
+
+@c POSIX says:
+@c Editing commands other than {...}, a, b, c, i, r, t, w, :, and #
+@c can be followed by a <semicolon>, optional <blank> characters, and
+@c another editing command. However, when an s editing command is used
+@c with the w flag, following it with another command in this manner
+@c produces undefined results.
+
+There are several methods to specify multiple commands in a @command{sed}
+program.
+
+Using newlines is most natural when running a sed script from a file
+(using the @option{-f} option).
+
+On the command line, all @command{sed} commands may be separated by newlines.
+Alternatively, you may specify each command as an argument to an @option{-e}
+option:
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ seq 6 | sed '1d
+3d
+5d'
+2
+4
+6
+
+$ seq 6 | sed -e 1d -e 3d -e 5d
+2
+4
+6
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+A semicolon (@samp{;}) may be used to separate most simple commands:
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ seq 6 | sed '1d;3d;5d'
+2
+4
+6
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+The @code{@{},@code{@}},@code{b},@code{t},@code{T},@code{:} commands can
+be separated with a semicolon (this is a non-portable @value{SSED} extension).
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ seq 4 | sed '@{1d;3d@}'
+2
+4
+
+$ seq 6 | sed '@{1d;3d@};5d'
+2
+4
+6
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+Labels used in @code{b},@code{t},@code{T},@code{:} commands are read
+until a semicolon. Leading and trailing whitespace is ignored. In
+the examples below the label is @samp{x}. The first example works
+with @value{SSED}. The second is a portable equivalent. For more
+information about branching and labels @pxref{Branching and flow
+control}.
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ seq 3 | sed '/1/b x ; s/^/=/ ; :x ; 3d'
+1
+=2
+
+$ seq 3 | sed -e '/1/bx' -e 's/^/=/' -e ':x' -e '3d'
+1
+=2
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+
+
+@subsection Commands Requiring a newline
+
+The following commands cannot be separated by a semicolon and
+require a newline:
+
+@table @asis
+
+@item @code{a},@code{c},@code{i} (append/change/insert)
+
+All characters following @code{a},@code{c},@code{i} commands are taken
+as the text to append/change/insert. Using a semicolon leads to
+undesirable results:
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ seq 2 | sed '1aHello ; 2d'
+1
+Hello ; 2d
+2
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+Separate the commands using @option{-e} or a newline:
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ seq 2 | sed -e 1aHello -e 2d
+1
+Hello
+
+$ seq 2 | sed '1aHello
+2d'
+1
+Hello
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+Note that specifying the text to add (@samp{Hello}) immediately
+after @code{a},@code{c},@code{i} is itself a @value{SSED} extension.
+A portable, POSIX-compliant alternative is:
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ seq 2 | sed '1a\
+Hello
+2d'
+1
+Hello
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+@item @code{#} (comment)
+
+All characters following @samp{#} until the next newline are ignored.
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ seq 3 | sed '# this is a comment ; 2d'
+1
+2
+3
+
+
+$ seq 3 | sed '# this is a comment
+2d'
+1
+3
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+@item @code{r},@code{R},@code{w},@code{W} (reading and writing files)
+
+The @code{r},@code{R},@code{w},@code{W} commands parse the filename
+until end of the line. If whitespace, comments or semicolons are found,
+they will be included in the filename, leading to unexpected results:
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ seq 2 | sed '1w hello.txt ; 2d'
+1
+2
+
+$ ls -log
+total 4
+-rw-rw-r-- 1 2 Jan 23 23:03 hello.txt ; 2d
+
+$ cat 'hello.txt ; 2d'
+1
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+Note that @command{sed} silently ignores read/write errors in
+@code{r},@code{R},@code{w},@code{W} commands (such as missing files).
+In the following example, @command{sed} tries to read a file named
+@samp{@file{hello.txt ; N}}. The file is missing, and the error is silently
+ignored:
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ echo x | sed '1rhello.txt ; N'
+x
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+@item @code{e} (command execution)
+
+Any characters following the @code{e} command until the end of the line
+will be sent to the shell. If whitespace, comments or semicolons are found,
+they will be included in the shell command, leading to unexpected results:
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ echo a | sed '1e touch foo#bar'
+a
+
+$ ls -1
+foo#bar
+
+$ echo a | sed '1e touch foo ; s/a/b/'
+sh: 1: s/a/b/: not found
+a
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+
+@item @code{s///[we]} (substitute with @code{e} or @code{w} flags)
+
+In a substitution command, the @code{w} flag writes the substitution
+result to a file, and the @code{e} flag executes the subsitution result
+as a shell command. As with the @code{r/R/w/W/e} commands, these
+must be terminated with a newline. If whitespace, comments or semicolons
+are found, they will be included in the shell command or filename, leading to
+unexpected results:
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ echo a | sed 's/a/b/w1.txt#foo'
+b
+
+$ ls -1
+1.txt#foo
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+@end table
@node sed addresses
@@ -3085,7 +3363,7 @@ command appends a newline and the next line to the pattern space
(i.e. @samp{1}, @samp{\n}, @samp{2} in the first cycle).
@item
The @code{l} command prints the content of the pattern space
-unambigiously.
+unambiguously.
@item
The @code{D} command then removes the content of pattern
space up to the first newline (leaving @samp{2} at the end of
@@ -3173,7 +3451,303 @@ and @ref{Line length adjustment}.
@node Branching and flow control
@section Branching and Flow Control
-TODO
+The branching commands @code{b}, @code{t}, and @code{T} enable
+changing the flow of @command{sed} programs.
+
+By default, @command{sed} reads an input line into the pattern buffer,
+then continues to processes all commands in order.
+Commands without addresses affect all lines.
+Commands with addresses affect only matching lines.
+@xref{Execution Cycle} and @ref{Addresses overview}.
+
+@command{sed} does not support a typical @code{if/then} construct.
+Instead, some commands can be used as conditionals or to change the
+default flow control:
+
+@table @code
+
+@item d
+delete (clears) the current pattern space,
+and restart the program cycle without processing the rest of the commands
+and without printing the pattern space.
+
+@item D
+delete the contents of the pattern space @emph{up to the first newline},
+and restart the program cycle without processing the rest of
+the commands and without printing the pattern space.
+
+@item [addr]X
+@itemx [addr]@{ X ; X ; X @}
+@item /regexp/X
+@item /regexp/@{ X ; X ; X @}
+Addresses and regular expressions can be used as an @code{if/then}
+conditional: If @var{[addr]} matches the current pattern space,
+execute the command(s).
+For example: The command @code{/^#/d} means:
+@emph{if} the current pattern matches the regular expression @code{^#} (a line
+starting with a hash), @emph{then} execute the @code{d} command:
+delete the line without printing it, and restart the program cycle
+immediately.
+
+@item b
+branch unconditionally (that is: always jump to a label, skipping
+or repeating other commands, without restarting a new cycle). Combined
+with an address, the branch can be conditionally executed on matched
+lines.
+
+@item t
+branch conditionally (that is: jump to a label) @emph{only if} a
+@code{s///} command has succeeded since the last input line was read
+or another conditional branch was taken.
+
+@item T
+similar but opposite to the @code{t} command: branch only if
+there has been @emph{no} successful substitutions since the last
+input line was read.
+@end table
+
+
+The following two @command{sed} programs are equivalent. The first
+(contrived) example uses the @code{b} command to skip the @code{s///}
+command on lines containing @samp{1}. The second example uses an
+address with negation (@samp{!}) to perform substitution only on
+desired lines. The @code{y///} command is still executed on all
+lines:
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ printf '%s\n' a1 a2 a3 | sed -E '/1/bx ; s/a/z/ ; :x ; y/123/456/'
+a4
+z5
+z6
+
+$ printf '%s\n' a1 a2 a3 | sed -E '/1/!s/a/z/ ; y/123/456/'
+a4
+z5
+z6
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+
+
+@subsection Branching and Cycles
+@cindex labels
+@cindex omitting labels
+@cindex cycle, restarting
+@cindex restarting a cycle
+The @code{b},@code{t} and @code{T} commands can be followed by a label
+(typically a single letter). Labels are defined with a colon followed by
+one or more letters (e.g. @samp{:x}). If the label is omitted the
+branch commands restart the cycle. Note the difference between
+branching to a label and restarting the cycle: when a cycle is
+restarted, @command{sed} first prints the current content of the
+pattern space, then reads the next input line into the pattern space;
+Jumping to a label (even if it is at the beginning of the program)
+does not print the pattern space and does not read the next input line.
+
+The following program is a no-op. The @code{b} command (the only command
+in the program) does not have a label, and thus simply restarts the cycle.
+On each cycle, the pattern space is printed and the next input line is read:
+
+@example
+@group
+$ seq 3 | sed b
+1
+2
+3
+@end group
+@end example
+
+@cindex infinite loop, branching
+@cindex branching, infinite loop
+The following example is an infinite-loop - it doesn't terminate and
+doesn't print anything. The @code{b} command jumps to the @samp{x}
+label, and a new cycle is never started:
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ seq 3 | sed ':x ; bx'
+
+# The above command requires gnu sed (which supports additional
+# commands following a label, without a newline). A portable equivalent:
+# sed -e ':x' -e bx
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+@cindex branching and n, N
+@cindex n, and branching
+@cindex N, and branching
+Branching is often complemented with the @code{n} or @code{N} commands:
+both commands read the next input line into the pattern space without waiting
+for the cycle to restart. Before reading the next input line, @code{n}
+prints the current pattern space then empties it, while @code{N}
+appends a newline and the next input line to the pattern space.
+
+Consider the following two examples:
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ seq 3 | sed ':x ; n ; bx'
+1
+2
+3
+
+$ seq 3 | sed ':x ; N ; bx'
+1
+2
+3
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+@itemize
+@item
+Both examples do not inf-loop, despite never starting a new cycle.
+
+@item
+In the first example, the @code{n} commands first prints the content
+of the pattern space, empties the pattern space then reads the next
+input line.
+
+@item
+In the second example, the @code{N} commands appends the next input
+line to the pattern space (with a newline). Lines are accumulated in
+the pattern space until there are no more input lines to read, then
+the @code{N} command terminates the @command{sed} program. When the
+program terminates, the end-of-cycle actions are performed, and the
+entire pattern space is printed.
+
+@item
+The second example requires @value{SSED},
+because it uses the non-POSIX-standard behavior of @code{N}.
+See the ``@code{N} command on the last line'' paragraph
+in @ref{Reporting Bugs}.
+
+@item
+To further examine the difference between the two examples,
+try the following commands:
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+printf '%s\n' aa bb cc dd | sed ':x ; n ; = ; bx'
+printf '%s\n' aa bb cc dd | sed ':x ; N ; = ; bx'
+printf '%s\n' aa bb cc dd | sed ':x ; n ; s/\n/***/ ; bx'
+printf '%s\n' aa bb cc dd | sed ':x ; N ; s/\n/***/ ; bx'
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+@end itemize
+
+
+
+@subsection Branching example: joining lines
+
+@cindex joining lines with branching
+@cindex branching, joining lines
+@cindex quoted-printable lines, joining
+@cindex joining quoted-printable lines
+@cindex t, joining lines with
+@cindex b, joining lines with
+@cindex b, versus t
+@cindex t, versus b
+As a real-world example of using branching, consider the case of
+@uref{https://en.wikipedia.org/wiki/Quoted-printable,quoted-printable} files,
+typically used to encode email messages.
+In these files long lines are split and marked with a @dfn{soft line break}
+consisting of a single @samp{=} character at the end of the line:
+
+@example
+@group
+$ cat jaques.txt
+All the wor=
+ld's a stag=
+e,
+And all the=
+ men and wo=
+men merely =
+players:
+They have t=
+heir exits =
+and their e=
+ntrances;
+And one man=
+ in his tim=
+e plays man=
+y parts.
+@end group
+@end example
+
+
+The following program uses an address match @samp{/=$/} as a
+conditional: If the current pattern space ends with a @samp{=}, it
+reads the next input line using @code{N}, replaces all @samp{=}
+characters which are followed by a newline, and unconditionally
+branches (@code{b}) to the beginning of the program without restarting
+a new cycle. If the pattern space does not ends with @samp{=}, the
+default action is performed: the pattern space is printed and a new
+cycle is started:
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ sed ':x ; /=$/ @{ N ; s/=\n//g ; bx @}' jaques.txt
+All the world's a stage,
+And all the men and women merely players:
+They have their exits and their entrances;
+And one man in his time plays many parts.
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+Here's an alternative program with a slightly different approach: On
+all lines except the last, @code{N} appends the line to the pattern
+space. A substitution command then removes soft line breaks
+(@samp{=} at the end of a line, i.e. followed by a newline) by replacing
+them with an empty string.
+@emph{if} the substitution was successful (meaning the pattern space contained
+a line which should be joined), The conditional branch command @code{t} jumps
+to the beginning of the program without completing or restarting the cycle.
+If the substitution failed (meaning there were no soft line breaks),
+The @code{t} command will @emph{not} branch. Then, @code{P} will
+print the pattern space content until the first newline, and @code{D}
+will delete the pattern space content until the first new line.
+(To learn more about @code{N}, @code{P} and @code{D} commands
+@pxref{Multiline techniques}).
+
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ sed ':x ; $!N ; s/=\n// ; tx ; P ; D' jaques.txt
+All the world's a stage,
+And all the men and women merely players:
+They have their exits and their entrances;
+And one man in his time plays many parts.
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+
+For more line-joining examples @pxref{Joining lines}.
+
@node Examples
@chapter Some Sample Scripts
@@ -3213,6 +3787,10 @@ Emulating standard utilities:
@node Joining lines
@section Joining lines
+This section uses @code{N}, @code{D} and @code{P} commands to process
+multiple lines, and the @code{b} and @code{t} commands for branching.
+@xref{Multiline techniques} and @ref{Branching and flow control}.
+
Join specific lines (e.g. if lines 2 and 3 need to be joined):
@codequoteundirected on
@@ -3232,7 +3810,7 @@ hello
@codequoteundirected off
@codequotebacktick off
-Join lines ending with backslashes:
+Join backslash-continued lines:
@codequoteundirected on
@codequotebacktick on
@@ -3257,6 +3835,42 @@ and another line
@codequoteundirected off
@codequotebacktick off
+Join lines that start with whitespace (e.g SMTP headers):
+
+@codequoteundirected on
+@codequotebacktick on
+@example
+@group
+$ cat 2.txt
+Subject: Hello
+ World
+Content-Type: multipart/alternative;
+ boundary=94eb2c190cc6370f06054535da6a
+Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
+Authentication-Results: mx.gnu.org;
+ dkim=pass header.i=@@gnu.org;
+ spf=pass
+Message-ID: <abcdef@@gnu.org>
+From: John Doe <jdoe@@gnu.org>
+To: Jane Smith <jsmith@@gnu.org>
+
+$ sed -E ':a ; $!N ; s/\n\s+/ / ; ta ; P ; D' 2.txt
+Subject: Hello World
+Content-Type: multipart/alternative; boundary=94eb2c190cc6370f06054535da6a
+Date: Tue, 3 Jan 2017 19:41:16 +0000 (GMT)
+Authentication-Results: mx.gnu.org; dkim=pass header.i=@@gnu.org; spf=pass
+Message-ID: <abcdef@@gnu.org>
+From: John Doe <jdoe@@gnu.org>
+To: Jane Smith <jsmith@@gnu.org>
+
+# A portable (non-gnu) variation:
+# sed -e :a -e '$!N;s/\n */ /;ta' -e 'P;D'
+@end group
+@end example
+@codequoteundirected off
+@codequotebacktick off
+
+
@node Centering lines
@section Centering Lines
@@ -4505,24 +5119,35 @@ the size of the buffer that can be processed by certain patterns.
@node Other Resources
@chapter Other Resources for Learning About @command{sed}
+For up to date information about @value{SSED} please
+visit @uref{https://www.gnu.org/software/sed/}.
+
+Send general questions and suggestions to @email{sed-devel@@gnu.org}.
+Visit the mailing list archives for past discussions at
+@uref{https://lists.gnu.org/archive/html/sed-devel/}.
+
@cindex Additional reading about @command{sed}
-In addition to several books that have been written about @command{sed}
-(either specifically or as chapters in books which discuss
-shell programming), one can find out more about @command{sed}
-(including suggestions of a few books) from the FAQ
-for the @code{sed-users} mailing list, available from:
-@display
-@uref{http://sed.sourceforge.net/sedfaq.html}
-@end display
-
-Also of interest are
-@uref{http://www.student.northpark.edu/pemente/sed/index.htm}
-and @uref{http://sed.sf.net/grabbag},
-which include @command{sed} tutorials and other @command{sed}-related goodies.
-
-The @code{sed-users} mailing list itself maintained by Sven Guckes.
-To subscribe, visit @uref{http://groups.yahoo.com} and search
-for the @code{sed-users} mailing list.
+The following resources provide information about @command{sed}
+(both @value{SSED} and other variations). Note these not maintained by
+@value{SSED} developers.
+
+@itemize @bullet
+
+@item
+sed @code{$HOME}: @uref{http://sed.sf.net}
+
+@item
+sed FAQ: @uref{http://sed.sf.net/sedfaq.html}
+
+@item
+seder's grabbag: @uref{http://sed.sf.net/grabbag}
+
+@item
+The @code{sed-users} mailing list maintained by Sven Guckes:
+@uref{http://groups.yahoo.com/group/sed-users/}
+(note this is @emph{not} the @value{SSED} mailing list).
+
+@end itemize
@node Reporting Bugs
@chapter Reporting Bugs