------------------------------- Page    i -------------------------------

                   Sed - A Non-interactive Text Editor




                                                 Lee E. McMahon

                                                 February 16, 1981

------------------------------- Page   ii -------------------------------

                            TABLE OF CONTENTS


1.    Introduction  . . . . . . . . . . . . . . . . . . . . . . . . .   1

2.    Overall Operation . . . . . . . . . . . . . . . . . . . . . . .   2

3.    Command Line Flags  . . . . . . . . . . . . . . . . . . . . . .   2

3.1      Order of Application of Editing Commands . . . . . . . . . .   2
3.2      Pattern Space  . . . . . . . . . . . . . . . . . . . . . . .   3
3.3      Examples . . . . . . . . . . . . . . . . . . . . . . . . . .   3

4.    Addresses: Selecting Lines for Editing  . . . . . . . . . . . .   3

4.1      Line Number Addresses  . . . . . . . . . . . . . . . . . . .   4
4.2      Context Addresses  . . . . . . . . . . . . . . . . . . . . .   4
4.3      Number of Addresses  . . . . . . . . . . . . . . . . . . . .   5
4.4      Examples . . . . . . . . . . . . . . . . . . . . . . . . . .   5

5.    Functions . . . . . . . . . . . . . . . . . . . . . . . . . . .   6

5.1      Whole Line Oriented Functions  . . . . . . . . . . . . . . .   6

5.1.1       Examples  . . . . . . . . . . . . . . . . . . . . . . . .   7

5.2      Substitute Function  . . . . . . . . . . . . . . . . . . . .   8

5.2.1       Examples  . . . . . . . . . . . . . . . . . . . . . . . .   9

5.3      Input/Output Functions . . . . . . . . . . . . . . . . . . .  10

5.3.1       Examples  . . . . . . . . . . . . . . . . . . . . . . . .  11

5.4      Multiple Input Line Functions  . . . . . . . . . . . . . . .  11
5.5      Hold and Get Functions . . . . . . . . . . . . . . . . . . .  12

5.5.1       Examples  . . . . . . . . . . . . . . . . . . . . . . . .  12

5.6      Control Flow Functions . . . . . . . . . . . . . . . . . . .  13
5.7      Miscellaneous Functions  . . . . . . . . . . . . . . . . . .  14


                                                            Last Page  14

-------------------------------- Page  1 --------------------------------

    ""     "Edited for UTS"




1.    INTRODUCTION

Sed is a non-interactive context editor designed to be especially  useful
in three cases:

 1.  To edit files too large for comfortable interactive editing.

 2.  To edit any size file when the  sequence of editing commands is  too
     complicated to be comfortably typed in interactive mode.

 3.  To perform multiple  'global' editing functions  efficiently in  one
     pass through the input.

Since only a few lines  of the input reside in  core at one time, and  no
temporary files are used, the  effective size of file that can be  edited
is limited only by the requirement that  the input and output fit  simul-
taneously into available secondary storage.

Complicated editing scripts can be created separately and given to sed as
a command file.  For complex  edits, this saves considerable typing,  and
its attendant errors.  Sed running from a command file is much more effi-
cient than any interactive editor known to us, even if that editor can be
driven by a prewritten script.

The principal loss  of functions  compared to an  interactive editor  are
lack of relative  addressing (because  of the line-at-a-time  operation),
and lack  of immediate  verification that  a command  has done  what  was
intended.

Sed is a lineal descendant of the UTS editor, ed.  Because of the differ-
ences between  interactive  and non-interactive  operation,  considerable
changes have been made  between ed and  sed; even confirmed  users of  ed
will frequently be surprised (and probably chagrined), if they rashly use
sed without reading {#4} and  {#5} of this  document.  The most  striking
family resemblance between the  two editors is  in the class of  patterns
('regular expressions') they recognize; the code for matching patterns is
copied almost verbatim from the  code for ed.  (This code was written  by
Dennis M. Ritchie.)

-------------------------------- Page  2 --------------------------------

2.    OVERALL OPERATION

Sed by default copies the standard input to the standard output,  perhaps
performing one or more editing commands on each line before writing it to
the output.  This behavior may be modified by flags on the command  line;
see {#3}.

The general format of an editing command is:

     [address1 [,address2]] function [arguments]

One or both addresses may be omitted; the format of addresses is given in
{#4}.  Any number of blanks  or tabs may separate the addresses from  the
function.  The function must be present; the available commands are  dis-
cussed in {#5}.  The arguments may be required or optional,  according to
which function is given;  again, they  are discussed in  {#5} under  each
individual function.

Tab characters and spaces at the beginning of lines are ignored.




3.    COMMAND LINE FLAGS

Three flags are recognized on the command line:

-n   tells sed not to copy all lines, but only those specified by p func-
     tions or p flags after s functions; see {#5.3} and {#5.2}.

-e   tells sed to take the next argument as an editing command;

-f   tells sed to take the next argument as a file name; the file  should
     contain editing commands, one to a line.


3.1      ORDER OF APPLICATION OF EDITING COMMANDS

Before any editing is done  (before any input file  is even opened),  all
the editing commands  are compiled  into a form  that will be  moderately
efficient during the execution  phase (when the  commands are applied  to
lines of  the input file).   The commands  are compiled in  the order  in
which they are encountered;  this is  generally the order  in which  they
will be attempted at execution  time.  The commands are applied one at  a
time; the input to each command is the output of all preceding commands.

The default  linear  order of  application  of editing  commands  can  be
changed by  the control flow  commands, t  and b; see  {#5}.  under  each

-------------------------------- Page  3 --------------------------------

individual function.  Even when  the order of  application is changed  by
these commands, it is  still true that the  input line to any command  is
the output of any previously applied command.


3.2      PATTERN SPACE

The range of pattern  matches is called  the pattern space.   Ordinarily,
the pattern space is one  line of the input text, but more than one  line
can be read into the pattern space by using the N command; see {#5.4}.


3.3      EXAMPLES

Examples are  scattered  throughout the  text.   Except  where  otherwise
noted, the examples all assume the following input text:

     In Xanadu did Kubla Khan
     A stately pleasure dome decree:
     Where Alph, the sacred river, ran
     Through caverns measureless to man
     Down to a sunless sea.

(The output of the sed commands is not to be considered an improvement on
Coleridge.)

The command

     2q

will quit after copying  the first two  lines of the  input.  The  output
will be:

     In Xanadu did Kubla Khan
     A stately pleasure dome decree:




4.    ADDRESSES: SELECTING LINES FOR EDITING

Lines in the input file(s)  to which editing commands  are to be  applied
can be selected by  addresses.  Addresses may  be either line numbers  or
context addresses.

The application of a group of commands  can be controlled by one  address
(or address pair) by  grouping the commands  with curly braces '{}';  see

-------------------------------- Page  4 --------------------------------

{#5.6}.


4.1      LINE NUMBER ADDRESSES

A line number is a decimal integer.  As each line is read from the input,
a line  number counter  is  incremented; a  line number  address  matches
(selects) the input line that  causes the internal  counter to equal  the
address line  number.   The counter  runs cumulatively  through  multiple
input files; it is not reset when a new input file is opened.

As a special case, the character

     $

matches the last line of the last input file.


4.2      CONTEXT ADDRESSES

A context address is a pattern ('regular expression') enclosed in slashes
('/').  The regular expressions recognized by sed are formed as follows:

 1.  An ordinary character (other than those discussed below) is a  regu-
     lar expression, and matches that character.

 2.  A circumflex '^' at  the beginning of  a regular expression  matches
     the null character at the beginning of a line.

 3.  A dollar sign '$'  at the end  of a regular  expression matches  the
     null character at the end of a line.

 4.  The characters '\n' match  an imbedded new-line  character, but  not
     the new-line at the end of the pattern space.

 5.  A period '.' matches any  character except the terminal new-line  of
     the pattern space.

 6.  A regular expression followed by an asterisk '*' matches any  number
     (including 0) of adjacent  occurrences of the regular expression  it
     follows.

 7.  A string of characters in square brackets '[]' matches any character
     in the string, and no  others.  If, however, the first character  of
     the string is  circumflex '^',  the regular  expression matches  any
     character except the characters in the string and the terminal  new-
     line of the pattern space.

 8.  A concatenation of regular expressions is a regular expression  that
     matches the concatenation  of strings matched  by the components  of

-------------------------------- Page  5 --------------------------------

     the regular expression.

 9.  A regular expression between the sequences '\(' and '\)' is  identi-
     cal in  effect to  the unadorned  regular expression,  but has  side
     effects that are described under the s command below and  specifica-
     tion 10 immediately below.

10.  The expression '\d' means the  same string of characters matched  by
     an expression enclosed in '\(' and '\)' earlier in the same pattern.
     Here d is a  single digit;  the string specified  is that  beginning
     with the dth occurrence of  '\(' counting from the left.  For  exam-
     ple, the expression '^\(.*\)\1'  matches a line  beginning with  two
     repeated occurrences of the same string.

11.  The  null  regular  expression  standing   alone  (e.g.,  '//')   is
     equivalent to the last regular expression compiled.

To use a special character (^ $ .  * [ ] \ /)  as a literal (to match  an
occurrence of itself in  the input), precede  the special character by  a
backslash '\'.

For a context address to 'match' the  input requires that the whole  pat-
tern within the address match some portion of the pattern space.


4.3      NUMBER OF ADDRESSES

The commands in the next  section can have 0, 1,  or 2 addresses.   Under
each command the  maximum number  of allowed addresses  is given.  For  a
command to have more addresses than the maximum allowed is considered  an
error.

If a command has no addresses, it is applied to every line in the input.

If a command has one address, it is applied to all lines that match  that
address.

If a command  has two addresses,  it is  applied to the  first line  that
matches the first address, and  to all later lines until (and  including)
the next line that matches the second  address.  Then an attempt is  made
on later  lines to  again match  the first  address, and  the process  is
repeated.

Two addresses are separated by a comma.


4.4      EXAMPLES


     /an/              matches lines 1, 3, 4 in our sample text

-------------------------------- Page  6 --------------------------------

     /an.*an/          matches line 1
     /^an/             matches no lines
     /./               matches all lines
     /\./              matches line 5
     /r*an/            matches lines 1, 3, 4 (the number of r's is zero!)
     /\(an\).*\1/      matches line 1




5.    FUNCTIONS

All functions are named by a single character.  In the following summary,
the  maximum  number  of   allowable  addresses  is  given  enclosed   in
parentheses, then the single character function name, possible  arguments
enclosed in angles '<>',  an expanded English  translation of the  single
character name, and  finally a  description of what  each function  does.
The angles around the arguments are not part of the  argument, and should
not be typed in editing commands.


5.1      WHOLE LINE ORIENTED FUNCTIONS

(2)d -- delete lines
     The d function deletes from the file (does not write to the  output)
     all those lines matched  by its address(es).   It also has the  side
     effect that no further  commands are  attempted on the  corpse of  a
     deleted line; as soon as  the d function is executed, a new line  is
     read from the input, and the  list of editing commands is  restarted
     from the beginning on the new line.

(2)n -- next line
     The n function reads  the next  line from the  input, replacing  the
     current line.   The current  line  is written  to the  output if  it
     should be.  The list of editing commands is continued following  the
     n command.

(1)a\
<text> -- append lines
     The a function causes the argument <text> to be written to the  out-
     put after  the  line  matched by  its  address.  The  a  command  is
     inherently multiline; a must appear at the end of a line, and <text>
     may contain any number of lines.  To preserve the  one-command-to-a-
     line fiction, the interior new-lines  must be hidden by a  backslash
     character ('\')  immediately  preceding the  new-line.   The  <text>
     argument is ended by the first unhidden new-line (the first one  not
     immediately  preceded  by  backslash).    Once  an  a  function   is

-------------------------------- Page  7 --------------------------------

     successfully executed, <text> will be written to the output  regard-
     less of what later commands  do to the line that triggered it.   The
     triggering line may be deleted entirely; <text> will still be  writ-
     ten to the output.  The  <text> is not scanned for address  matches,
     and no editing commands are attempted on it.  It does not cause  any
     change in the line number counter.

(1)i\
<text> -- insert lines
     The i function behaves  identically to the  a function, except  that
     <text> is written to the output before the matched line.   All other
     comments about the a function apply to the i function as well.

(2)c\
<text> -- change lines
     The c function deletes  the lines selected  by its address(es),  and
     replaces them with  the lines in <text>.   Like a and  i, c must  be
     followed by a new-line hidden by a backslash; and interior new lines
     in <text> must be hidden by backslashes.  The c command may have two
     addresses, and therefore select a range  of lines.  If it does,  all
     the lines in the range  are deleted, but only one copy of <text>  is
     written to the output, not one copy per line deleted.  As with a and
     i, <text> is not  scanned for address  matches, and no editing  com-
     mands are  attempted on  it.  It  does not  change the  line  number
     counter.  After a line has been deleted by a c  function, no further
     commands are attempted on the corpse.   If text is appended after  a
     line by a or  r functions, and the  line is later changed, the  text
     inserted by the c function will be placed  before the text of the  a
     or r functions.  (The r function is described in {#5.3}.)

Note: Within  the text  put in  the output  by these  functions,  leading
blanks and tabs will disappear, as always in sed commands.   To get lead-
ing blanks and tabs into the output,  precede the first desired blank  or
tab by a backslash; the backslash will not appear in the output.


5.1.1       EXAMPLES

The list of editing commands:

     n
     a\
     XXXX
     d

applied to our standard input, produces:

     In Xanadu did Kubhla Khan
     XXXX
     Where Alph, the sacred river, ran

-------------------------------- Page  8 --------------------------------

     XXXX
     Down to a sunless sea.

In this particular case, the same effect  would be produced by either  of
the two following command lists:

     n              n
     i\             c\
     XXXX           XXXX
     d



5.2      SUBSTITUTE FUNCTION

One important  function changes  parts  of lines  selected by  a  context
search within the line.

(2)s<pattern><replacement><flags> -- substitute
     The s function replaces part of a line (selected by <pattern>)  with
     <replacement>.  It can best be read:

     Substitute for  <pattern>, <replacement>.   The  <pattern>  argument
     contains a  pattern, exactly  like  the patterns  in addresses;  see
     {#4.2}.  The only difference between <pattern> and a context address
     is that the context address must be delimited by slash ('/') charac-
     ters; <pattern> may be delimited  by any character other than  space
     or new-line.  By default, only the first string matched by <pattern>
     is replaced, but see the g flag below.

     The <replacement>  argument  begins  immediately  after  the  second
     delimiting character of <pattern>, and must be followed  immediately
     by another instance of  the delimiting character.   (Thus there  are
     exactly three instances of the delimiting character.)  The <replace-
     ment> is not a pattern, and the characters that are special in  pat-
     terns do not have special meaning in <replacement>.  Instead,  other
     characters are special:  '&' is  replaced by the  string matched  by
     <pattern>; '\d' (where d is  a single digit) is replaced by the  dth
     substring matched by parts of  <pattern> enclosed in '\(' and  '\)'.
     If nested substrings occur  in <pattern>, the  dth is determined  by
     counting opening delimiters ('\(').  As in patterns, special charac-
     ters may be made literal by preceding them with backslash ('\').

     The <flags> argument may contain the following flags:

     g    Substitute <replacement> for all (non-overlapping) instances of
          <pattern> in the  line.  After  a successful substitution,  the
          scan for the next instance  of <pattern> begins just after  the
          end of the  inserted characters; characters  put into the  line
          from <replacement> are not rescanned.

-------------------------------- Page  9 --------------------------------

     p    Print the line  if a  successful replacement was  done.  The  p
          flag causes the line to be written to the output if and only if
          a substitution  was made  by the  s function.   Notice that  if
          several s functions,  each followed by  a p flag,  successfully
          substitute in the same input line, multiple copies of the  line
          will be written to the output, one for each successful  substi-
          tution.

     w file
          Write the line to  file if a  successful replacement was  done.
          The w flag causes lines that are substituted by the  s function
          to be written to a file named  by file.  If file exists  before
          sed is run, it is overwritten; if not, it is created.  A single
          space must separate w and file.  The possibilities of multiple,
          somewhat different copies of  one input line being written  are
          the same as for p.  A maximum of 20 different file names may be
          mentioned after w flags and w functions (see below), combined.


5.2.1       EXAMPLES

The following command, applied to our standard input,

     s/to/by/w changes

produces, on the standard output:

     In Xanadu did Kubhla Khan
     A stately pleasure dome decree:
     Where Alph, the sacred river, ran
     Through caverns measureless by man
     Down by a sunless sea.

and, on the file 'changes':

     Through caverns measureless by man
     Down by a sunless sea.

If the nocopy option is in effect, the command:

     s/[.,;?:]/*P&*/gp

produces:

     A stately pleasure dome decree*P:*
     Where Alph*P,* the sacred river*P,* ran
     Down to a sunless sea*P.*

Finally, to illustrate the effect of the g flag, the command:

-------------------------------- Page 10 --------------------------------

     /X/s/an/AN/p

produces (assuming nocopy mode):

     In XANadu did Kubhla Khan

and the command:

     /X/s/an/AN/gp

produces:

     In XANadu did Kubhla KhAN


5.3      INPUT/OUTPUT FUNCTIONS

(2)p -- print
     The print function writes the addressed lines to the standard output
     file.  They are written at  the time the p function is  encountered,
     regardless of what succeeding editing commands may do to the lines.

(2)w file -- write to file
     The write function writes the addressed  lines to the file named  by
     file.  If the file previously existed, it is overwritten; if not, it
     is created.  The lines are  written exactly as  they exist when  the
     write function  is encountered  for  each line,  regardless of  what
     later editing  commands may  do  to them.   Exactly one  space  must
     separate the w  and file.  A  maximum of 20  different files may  be
     mentioned in write  functions and  w flags after  s functions,  com-
     bined.

(1)r file -- read the contents of file
     The read function reads the contents of file, and appends them after
     the line  matched by  the address.   The file is  read and  appended
     regardless of  what  later editing  commands  do to  the  line  that
     matched its address.  If r and a functions are executed  on the same
     line, the text from the a functions  and the r functions is  written
     to the output in the order that the functions are executed.  Exactly
     one space must separate the r and filename.  If a file mentioned  by
     a r function cannot be opened, it is considered a  null file, not an
     error, and no diagnostic is given.

Note:  Since there is a limit to the number  of files that can be  opened
simultaneously, care should be taken  that no more than 20 files be  men-
tioned in w functions or  flags; that number is reduced  by one if any  r
functions are present.  (Only one read file is open at one time.)

-------------------------------- Page 11 --------------------------------

5.3.1       EXAMPLES

Assume that the file 'note1' has the following contents:

     Note:  Kubla Khan (more properly Kublai Khan; 1216-1294)
     was the grandson and most eminent successor of Genghiz
     (Chingiz) Khan, and founder of the Mongol dynasty in China.

Then the following command:

     /Kubla/r note1

produces:

     In Xanadu did Kubla Khan
     Note:  Kubla Khan (more properly Kublai Khan; 1216-1294)
     was the grandson and most eminent successor of Genghiz
     (Chingiz) Khan, and founder of the Mongol dynasty in China.
     A stately pleasure dome decree:
     Where Alph, the sacred river, ran
     Through caverns measureless to man
     Down to a sunless sea.



5.4      MULTIPLE INPUT LINE FUNCTIONS

Three functions, all spelled  with capital letters,  deal specially  with
pattern spaces containing imbedded  new-lines; they are intended  princi-
pally to provide pattern matches across lines in the input.

(2)N -- Next line
     The next input line is appended to  the current line in the  pattern
     space; the two input  lines are separated  by an imbedded  new-line.
     Pattern matches may extend across the imbedded new-line(s).

(2)D -- Delete first part of the pattern space
     Delete up  to and  including  the first  new-line character  in  the
     current pattern space.  If the pattern space becomes empty (the only
     new-line was  the terminal  new-line), read  another line  from  the
     input.  In any case, begin  the list of editing commands again  from
     its beginning.

(2)P -- Print first part of the pattern space
     Print up to and including the first new-line in the pattern space.

The P and D functions are equivalent to their lower-case counterparts  if
there are no imbedded new-lines in the pattern space.

-------------------------------- Page 12 --------------------------------

5.5      HOLD AND GET FUNCTIONS

Four functions save and  retrieve part  of the input  for possible  later
use.

(2)h -- hold pattern space
     The h functions copies the contents of the pattern space into a hold
     area (destroying the previous contents of the hold area).

(2)H -- Hold pattern space
     The H function appends the contents of the pattern space to the con-
     tents of the hold area; the former and new contents are separated by
     a new-line.

(2)g -- get contents of hold area
     The g function copies the contents of the hold area into the pattern
     space (destroying the previous contents of the pattern space).

(2)G -- Get contents of hold area
     The G function appends the contents of the hold area to the contents
     of the pattern space; the former and new contents are separated by a
     new-line.

(2)x -- exchange
     The exchange command interchanges the contents of the pattern  space
     and the hold area.


5.5.1       EXAMPLES

The commands

     1h
     1s/ did.*//
     1x
     G
     s/\n/  :/

applied to our standard example, produce:

     In Xanadu did Kubla Khan  :In Xanadu
     A stately pleasure dome decree:  :In Xanadu
     Where Alph, the sacred river, ran  :In Xanadu
     Through caverns measureless to man  :In Xanadu
     Down to a sunless sea.  :In Xanadu

-------------------------------- Page 13 --------------------------------

5.6      CONTROL FLOW FUNCTIONS

These functions do no editing on the input lines, but control the  appli-
cation of functions to the lines selected by the address part.

(2)! -- Don't
     The Don't  command causes  the  next command  (written on  the  same
     line), to be applied to all and only those input  lines not selected
     by the address part.

(2){ -- Grouping
     The grouping  command '{'  causes the  next set  of commands  to  be
     applied (or not applied) as  a block to the input lines selected  by
     the addresses of the  grouping command.  The  first of the  commands
     under control of the grouping may appear on the same line as the '{'
     or on the next line.

     The group of commands is ended by a matching '}' standing on a  line
     by itself.

     Groups can be nested.

(0):<label> -- place a label
     The label function marks  a place  in the list  of editing  commands
     that may be referred  to by b and  t functions.  The <label> may  be
     any sequence of eight  or fewer characters;  if two different  colon
     functions have identical labels,  a compile time diagnostic will  be
     generated, and no execution attempted.

(2)b<label> -- branch to label
     The branch function causes  the sequence of  editing commands  being
     applied to the current input line to be restarted immediately  after
     the place where a colon function  with the same <label> was  encoun-
     tered.  If no colon function with the same label can  be found after
     all the editing commands have been compiled, a compile time diagnos-
     tic is produced, and no  execution is attempted.  A b function  with
     no <label> is taken to be a branch to the end of the list of editing
     commands; whatever should  be done  with the current  input line  is
     done, and another input line is  read; the list of editing  commands
     is restarted from the beginning on the new line.

(2)t<label> -- test substitutions
     The t function tests whether any successful substitutions have  been
     made on the current  input line; if  so, it branches to <label>;  if
     not, it does nothing.  The flag meaning that a successful  substitu-
     tion has been executed is reset by:

       *  reading a new input line, or
       *  executing a t function.

-------------------------------- Page 14 --------------------------------

5.7      MISCELLANEOUS FUNCTIONS

(1)= -- equals
     The = function writes to the standard output the line number of  the
     line matched by its address.

(1)q -- quit
     The q function causes the current line  to be written to the  output
     (if it should be), any appended or read text to be written, and exe-
     cution to be halted.

-------------------------------- The End --------------------------------
