------------------------------- Page    i -------------------------------

                     UTS Problem Determination Guide

------------------------------- Page   ii -------------------------------

                            TABLE OF CONTENTS


1.    Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . .   1

2.    Introduction  . . . . . . . . . . . . . . . . . . . . . . . . .   1

3.    Introduction to UTS . . . . . . . . . . . . . . . . . . . . . .   1

3.1      User Processes . . . . . . . . . . . . . . . . . . . . . . .   2

4.    File Systems  . . . . . . . . . . . . . . . . . . . . . . . . .   2

4.1      Special Files  . . . . . . . . . . . . . . . . . . . . . . .   3

5.    Kernel Organization . . . . . . . . . . . . . . . . . . . . . .   3

5.1      Interrupts . . . . . . . . . . . . . . . . . . . . . . . . .   4

6.    System Tables . . . . . . . . . . . . . . . . . . . . . . . . .   5

6.1      User, Proc . . . . . . . . . . . . . . . . . . . . . . . . .   5
6.2      Memory Structures  . . . . . . . . . . . . . . . . . . . . .   5
6.3      Block Device Tables  . . . . . . . . . . . . . . . . . . . .   6
6.4      Character Device Tables  . . . . . . . . . . . . . . . . . .   6
6.5      I/O tables . . . . . . . . . . . . . . . . . . . . . . . . .   6

7.    General Steps to Problem Determination  . . . . . . . . . . . .   7

7.1      Introduction . . . . . . . . . . . . . . . . . . . . . . . .   7
7.2      Is it VM?  . . . . . . . . . . . . . . . . . . . . . . . . .   7
7.3      Did UTS Panic? . . . . . . . . . . . . . . . . . . . . . . .   7
7.4      Is it Just Slow? . . . . . . . . . . . . . . . . . . . . . .   8
7.5      Kernel Debugging Techniques  . . . . . . . . . . . . . . . .   8

8.    The Dump Formatter  . . . . . . . . . . . . . . . . . . . . . .   9

9.    The Dskfix Program  . . . . . . . . . . . . . . . . . . . . . .  10

10.   Panic Messages With Explanations  . . . . . . . . . . . . . . .  10


                                                            Last Page  16

-------------------------------- Page  1 --------------------------------

1.    PURPOSE

This document is to be used by system programmers to determine the source
of problems on the UTS system.  It is assumed that  these system program-
mers have some familiarity with UTS and with VM/370.




2.    INTRODUCTION

This document contains information useful  to a system programmer who  is
trying to determine the cause of a UTS problem.  This  document is organ-
ized as follows:

  *  General explanation of UTS,

  *  key system tables,

  *  steps in determining the cause of a problem in a running or dead UTS
     system,

  *  how to use the dump formatter,

  *  how to use the dskfix program,

  *  table of UTS "panic" messages and their explanations.




3.    INTRODUCTION TO UTS

UTS is a multi-user time-sharing system which runs in a virtual  machine.
One UTS system can  support a number  of users who  are dialed in to  the
system on 3270 or ASCII  terminals.  The UTS kernel schedules  processes,
manages memory, controls I/O and handles interrupts.  The kernel runs  in
supervisor state in real memory while user processes run in problem state
in virtual memory.  (References in this manual to real and virtual memory
are relative to the UTS virtual machine.)

Each user process runs in its own  16 megabyte virtual address space;  it
does not share memory  with the system  or with other users (except  that
pure text can be shared between users  - see below).  The kernel  manages
the segment and page translation  tables for each user process and  allo-
cates pages from real memory as needed.  The kernel does not do  swapping

-------------------------------- Page  2 --------------------------------

or paging; it relies on CP paging to provide memory as needed.


3.1      USER PROCESSES

In UTS, the user address space starts with a text segment containing pure
code (which may be shared among processes via a common  segment - see the
"-n" option of ld(1)).  This is followed by a data segment which contains
global and  static  variables.   Above  this, the  process  may  allocate
storage dynamically via the sbreak(2) system call.  Then there is a large
gap to the stack segment, which starts at 16 megabytes and grows downward
automatically as  needed.  This  segment  is used  for  local  variables,
register save area, passing arguments to subroutines, etc.  The user pro-
cess requests system services via system  calls (SVCs), all of which  are
documented in volume 2 of the UTS Programmer's Manual.

The user process does I/O through the basic 8 I/O system calls:  open(2),
close(2), read(2),  write(2), seek(2),  ioctl(2),  creat(2), and  dup(2).
There is  only one  "access method":  all files  are considered  to be  a
stream of bytes.  There  is no concept  within the system of physical  or
logical record boundaries; in text files the newline character serves  to
delimit lines (of any length, including zero), while in binary  files the
logical records are just what the program makes them.

For convenience in manipulating text  files, a "standard I/O library"  is
provided.  This is a collection of subroutines which provide a convenient
high level  interface to  the basic  system calls  described above.   For
example, there is a routine  printf which formats data according to  user
specifications.

In fact, files on disk are physically blocked 4K bytes per block, so pro-
grams which request 4K blocks will be a bit more efficient.  However, the
system assumes responsibility  for any blocking  and unblocking  required
for transfers of a different size if the user program requests it.




4.    FILE SYSTEMS

A file system on a disk is a collection of 4K blocks.  The first is,  for
historical reasons, not used.  The second contains the "superblock" which
contains the size of the disk, the list of free data blocks, the time  of
last use, etc.  The next  n blocks (n is defined in the superblock)  con-
tain the i-list, which is  a linear list of i-nodes.   Every file has  an
i-node which contains the  owner of the  file, the access permissions  of
the file, the type of the file, the number of links to the file, date  of
last access, the number  of bytes in  the file, and the blocks  allocated

-------------------------------- Page  3 --------------------------------

for the file.  A directory file contains  pairs of file names and  i-node
numbers; a  file can  have  many names  simply by  having many  directory
entries (see ln(1) and link(2)).  Every file system has a root  directory
which is the ancestor of all files and directories on the file system.

An example would be  a file named  /usr/bin.  The initial  '/' means  the
search starts with the root of the file system, search the root directory
for a directory called 'usr', then search that directory for a file named
'bin'.  This is the target file.


4.1      SPECIAL FILES

UTS uses the mechanism  of special  files to provide  access directly  to
devices.  There  are two  types  of special  files: block  special  files
(disk), and character special files  (terminals and everything else).   A
special file contains a  UTS device number which  is made up of two  com-
ponents: a "major device number" and a "minor device number".  The  major
device number  is used  by the  kernel to  index into a  table of  device
drivers, while the minor device number is passed to the driver to  enable
it to  distinguish among  several devices  of the same  type.  There  are
separate tables  for block  and character  devices so  the numbers  never
clash.  Thus a magnetic tape  may be named "/dev/tape/mt1", where the  i-
node "/dev/tape/mt1" is a  character special file  with a certain  device
number.




5.    KERNEL ORGANIZATION

If you want to really understand the kernel  there is only one way:  read
the code.  The kernel is amazingly well-written, so this is not nearly as
difficult as it might seem.  This section will give a general overview of
the organization of  the kernel,  but there is  really no substitute  for
reading the code.

The kernel source on the distributed system is all under  "/usr/src/sys".
There are four directories:

h    contains the header files for the kernel,

conf contains the configuration modules and zero.s,

dev  contains the device drivers and associated I/O modules,

sys  contains  the  system  calls,  scheduling,  tack  switching,  memory
     management, etc.

-------------------------------- Page  4 --------------------------------

The best way to begin is with zero.s, the assembler module which  resides
in low memory and receives  control after any interrupt.  In the IBM  370
architecture, an interrupt loads a new PSW  from a fixed location in  low
memory.  The new  PSW contains  an instruction address  in zero.s.   Zero
saves the registers, gets a stack pointer  if necessary, and calls the  C
routine trap.  The kernel stack for each user process resides in the user
structure (discussed  below) for  that process,  so zero  sets the  stack
pointer (register 13) to point in that area.


5.1      INTERRUPTS

The trap routine first does some  CPU time accounting, then decides  what
kind of interrupt  it has and  switches out to  a second level  interrupt
handler.  Some comments on each type of interrupt are appropriate:

 1.  SVC interrupts.  When a user process wants to request a service from
     the system it issues an SVC, which causes an SVC  interrupt.  All of
     the system calls defined in section 2 of the UTS Programmer's Manual
     turn into SVCs.

 2.  Program interrupts.  If a  program interrupt occurs  in a user  pro-
     cess, it usually  turns into  some type of  signal (see  signal(2)).
     Signals are handled by various routines in the sys directory.

     If a program interrupt occurs in the kernel, it panics.  The  kernel
     panics on any unrecoverable error,  writing a dump to the dump  file
     if possible, flushing the  buffers if possible,  and then  resetting
     the system.  The system does  not try to automatically bring  itself
     back up.

 3.  I/O interrupts.  I/O interrupts are  handled in iointr.c in the  dev
     directory.  When appropriate, device  driver interrupt routines  are
     called to process the interrupt.   There is an extensive  discussion
     of the I/O system  in The UTS I/O  System in Documents For Use  With
     The UTS Time-sharing System.

 4.  External interrupts.  These are timer (and VMCF) interrupts, and are
     handled in clock.c

 5.  Machine checks.  Since UTS runs under VM, it never sees any  machine
     checks.

One interesting feature of SVC and program interrupts is that the  system
does not necessarily  complete processing of  the interrupt and  directly
return, as it does with other kinds of interrupts.  Instead, the  process
may need to wait (e.g. for I/O completion).  This is  done by calling the
sleep routine, which puts the process to sleep (suspends its  execution),
and selects another process to run.  Later, a wakeup call will again make
the process eligible for execution.

-------------------------------- Page  5 --------------------------------

The kernel runs disabled for external and I/O interrupts.  The  (virtual)
machine is enabled for these interrupts only when a user  process is run-
ning, the window routine is  called or the idle  routine is called.   The
window routine is called  during scheduling for  the purpose of  allowing
any pending interrupts to come through.  The idle routine is called  when
UTS has nothing to  do; it puts the  (virtual) machine into a wait  state
waiting for an interrupt.




6.    SYSTEM TABLES

This section will present an overview of the various tables used to store
all the  key  data needed  by  the kernel.   All  the kernel  tables  are
declared in header files in the "h"  directory.  Once again it should  be
emphasized: there is no  substitute for reading  the code.  This  section
does not attempt to cover  every entry in every  table, only the  overall
purpose and use of each table is discussed.


6.1      USER, PROC

Each user process has  a proc and  a user structure  associated with  it.
These structures (there  are two  because the user  structure used to  be
swapped out during  swapping, while  the proc  structure wasn't)  contain
data such as the  user id, the  CPU time consumed, priority, pointers  to
open files, a pointer to current directory, pointers to the user's pages,
etc.  The user and  proc structures contain  pointers to each other,  and
the proc structures are linked  together by the  scheduler when they  are
runnable.  In addition, there is  a pointer in low memory (usually  loca-
tion 512) which points  to the  user structure of  the currently  running
process.  The user structure is stored in the low part  of a page fetched
for that purpose.  The high part of the page is used for a run-time stack
for the kernel when it is servicing interrupts during that process.


6.2      MEMORY STRUCTURES

At ipl  time the  kernel  builds the  coretab structure  just  above  the
kernel's data segment.   This structure  contains a linked  list of  free
pages, and also the storage  key for every  page in memory.   Remembering
the storage key means not having to set it so  often, a significant effi-
ciency improvement when running under VM/370.

The memory management routines in sys/mem.c also allocate segment  tables
and page tables for user  processes as needed.  These tables are used  by
the address translation hardware, and also by the system to keep track of

-------------------------------- Page  6 --------------------------------

which pages  are allocated.   These  tables are  pointed to  by the  user
structure.


6.3      BLOCK DEVICE TABLES

Block devices are disks which are formatted in fixed length blocks and on
which file  systems are  stored.   The bdevsw  table in  the  conf/conf.c
module is used by the kernel to locate the appropriate device driver  for
each block device.  The index into this table is the  major device number
from the block special file i-node (see above).

Each block device has a devtab table  which points to the blocks  associ-
ated with that device.  Each block in memory has a structure buf (defined
in h/buf.h) associated with it.  This structure is doubly linked both  on
a device queue and a freelist queue.  See The UTS I/O System for a fuller
explanation.

There is an i-node table which  contains in-core versions of all  i-nodes
currently open, and a mount table which contains data about  each mounted
file system.


6.4      CHARACTER DEVICE TABLES

Character devices are tty's and any  other devices which don't fall  into
the category  of  block  devices.   This includes  the  console,  reader,
printer, punch, memory, vmcf, and tape.  The cdevsw table, indexed by the
major device number from  the i-node, is  used to locate the  appropriate
device driver as needed.

Each terminal has  a tty  structure associated with  it, which  remembers
whether double  character translation  is necessary, what  the erase  and
kill characters are, and contains pointers to linked lists of  characters
on the input and output queues for the terminal.  Each 3270 type terminal
also has a tube structure containing pf key settings, I/O buffers, etc.


6.5      I/O TABLES

The I/O supervisor has  a structure  for each channel,  control unit  and
device in the system.  These structures contain the current status of the
device, and also point to  queued I/O for the device.   There is also  an
I/O queue element, called  an ioq, for  each I/O operation pending or  in
progress.  See The UTS I/O System for more details on the I/O subsystem.

-------------------------------- Page  7 --------------------------------

7.    GENERAL STEPS TO PROBLEM DETERMINATION

This section will outline  some useful  steps to be  performed by  system
programmers who are trying to determine the source of a  problem relating
to the UTS system.


7.1      INTRODUCTION

What is the problem?  This is the key question.  If the problem is a mal-
function in the editor, compiler,  nroff, etc., this is not the  document
for you to look at.  If the problem is a system that is not responding to
user input, a hung process, or a hung device, then read on.

This document will proceed  from the  most serious problems  to the  less
serious ones.  In each case,  the description will include the  symptoms,
the cause, and the cure.  There will be some cures that are easy,  others
that are hard, and  some cases where we  don't really even know what  the
problem is.


7.2      IS IT VM?

If you have a completely dead system, that is one which is not responding
to any input (not  even turning off input  inhibit on a 3270), the  first
thing to do is check out VM.  The best way to do this is to find  another
terminal and try to  do something on VM.   For example, you might try  to
log on to CMS.  If you cannot do this, the thing to do is to get VM  run-
ning again before looking any  further at UTS.  Getting VM running  again
is outside the scope of this document (see your VM systems programmer).


7.3      DID UTS PANIC?

If VM is running and UTS is dead, the next  step is to log on to the  UTS
virtual machine  (i.e.,  the virtual  console).   Issue a  "#CP D  P"  to
display the virtual PSW.  If it  is 00020000 00000000, UTS probably  pan-
icked.  UTS will panic  when it discovers  an unrecoverable error  within
itself.  This of course should  not happen; nonetheless  it does.  If  it
did panic, then all the users should have been thrown off the system.  If
they weren't, it probably wasn't  a panic.  If it was  a panic, the  only
option you have is to  re-IPL the system.  You should issue a "#CP  SPOOL
CONS CLOSE" before you  do.  This  will make the  console file  available
after you enter multi-user mode after the IPL.  (The console file will be
read by the vmread program  and placed in the /usr/spool/rdr  directory.)
You may  then examine  the console  file to  determine the  cause of  the
panic.  One of the last lines of the console file should be a message  of
the form "panic: <panic message>".  Appendix A of this document  contains
all the panic  messages issued  by the  kernel, and  some explanation  of
their cause.

-------------------------------- Page  8 --------------------------------

The best way to determine  the underlying cause of the  panic is to  read
the code in the kernel that caused it to arise.  You can grep through the
kernel source for the panic message to find the appropriate code.


7.4      IS IT JUST SLOW?

It may be  that the  system is  just running  very slowly.   This can  be
caused by  a VM  overload  such that  VM is  not giving  the UTS  virtual
machine much time.  This can be very serious  if there are many users  on
UTS since as far as  VM is concerned UTS is just another virtual  machine
just like any CMS user.  One option to get around this is to set  favored
status in VM for the  UTS virtual machine.  Consult your VM systems  pro-
grammer for how to do this.

To determine if it is  just slow, do a  "#CP TR SIO  I/O SVC RUN".   This
will cause CP to display all start I/O operations, all I/O interrupts and
all SVC calls on the console.  If the system is just slow, you should see
all of these things  displayed in some  order, but very slowly.   Another
thing to try is to do a "#CP TR EXT" and watch for timer interrupts  (EXT
1004).  These  should appear once  a second,  for that is  how often  UTS
requests them.  If they appear less often, it is probably due to VM being
slow.  The CPU timer interrupt  (EXT 1005) happens only when a user  pro-
cess exceeds its time slice (currently 1/16 sec.).  This does not seem to
happen often.  Eventually you will have to stop the tracing  with "#CP TR
END".

One good thing to try after doing the CP TRACE is to dial up on a  termi-
nal.  VM should present UTS  with an I/O interrupt at the device  address
you just dialed.  The interrupt  will be a device end,  which is a  (vir-
tual) power-on.  UTS  should respond  fairly quickly with  an SIO to  the
virtual address.  If it doesn't respond within a few seconds, then UTS is
probably very messed up, and you will have to be re-IPL.

Another thing you should be seeing at  this point is some I/O  operations
to the disks.  The program /etc/update, which is normally running  in the
background, should be  doing a  sync every 30  seconds or  so.  The  sync
should at least result in the super-block being written to each disk.  If
there is no disk I/O for more that a minute or two, UTS is probably  very
messed up, and you will have to re-IPL.

If you want UTS to take a dump, to be  analyzed later with the dump  for-
matter (see below), the CP command "#CP SYSTEM RESTART" will cause UTS to
panic and take a dump.


7.5      KERNEL DEBUGGING TECHNIQUES

The purpose of this section is to give some hints to those who are  modi-
fying the standard UTS  kernel and need  some help debugging their  code.

-------------------------------- Page  9 --------------------------------

The most powerful tool for  debugging the kernel  is the printf  routine.
It is mostly similar to the standard one, but without  floating point and
longs.  It is in sys/prf.c if you want to look at it.  Clearly, the ques-
tion of what to print out and when is impossible to answer in general; it
depends on what you're trying to do and what is going wrong.

Sometimes it is desirable to  look at an assembler  listing of a  module.
The command "cc -S module.c" will produce a file "module.s"  which is the
assembler code generated by  the compiler for  the module.  Even  better,
"cc -c -L module.c" will  produce a file "module.lst" which contains  the
generated assembler,  the  preprocessed source  code  (commented  out  of
course), and  the hex  object generated  by the assembler  for the  code.
This together with a namelist  (generated by the  nm command) will  allow
you to determine what is where at runtime.




8.    THE DUMP FORMATTER

A very useful tool in determining problems with UTS is the dump formatter
program.  This program can examine either a live system or  a binary dump
and can format all the important system tables described above.

To use the dump formatter  on the live system,  simply enter the  command
"/etc/dump".  It  defaults to  looking  at /dev/mem,  which is  the  real
memory of UTS.  See the manual page for the specific format of the  vari-
ous commands accepted by the dump formatter.

Whenever the system panics, it tries to take a dump to the file "dump" in
the root directory of the "dump disk", usually at DD0.  If you want it to
panic and  take a  dump, the  CP command  SYSTEM RESET  will cause  this,
unless the  system is really  in a  terrible state.  If  it succeeds,  it
stores a hex 'D' in the fullword at location zero, so you know it worked.
You can then examine the  dump after bringing the system back up.  If  it
did not work, location zero will contain zero, and your only alternatives
are to not take a dump at all, or to  take a dump with the CP "DUMP" com-
mand.  Unfortunately, this dump can only be printed, and cannot be  exam-
ined by the dump formatter  or other UTS tools.  Therefore you will  have
to plow through the hex dump yourself.

If you want to do  this, you should first spool  the printer with the  CP
command "SPOOL PRT CLASS A  COPY 1 HOLD", then take the dump with the  CP
command "DUMP 0-END", then close the printer with "CLOSE PRT HOLD".  Then
you should close the console with "SPOOL CONS CLOSE", and re-IPL the sys-
tem.  After looking at the console file  which comes in through the  vir-
tual reader, you  must decide if you  want the dump.   If you do,  become
superuser (see su(1)), enter "/etc/cpmode", and do the CP command "CH PRT

-------------------------------- Page 10 --------------------------------

nnn NOHOLD", where  "nnn" is the  spool file number,  gotten from "Q  PRT
ALL".  If you decide you don't want the dump after all, you enter the  CP
command "PURGE PRT nnn"  to purge the  print file.  You should not  leave
the dump around for a great length of time since it takes up space on the
CP spool disk.




9.    THE DSKFIX PROGRAM

UTS maintains an in-memory cache of i-nodes  and blocks from disk.  As  a
result, if UTS goes down  after writing some but not all of this data  to
disk, the information on the disk  may be inconsistent.  Therefore it  is
necessary to check the  disks at every  IPL, and fix whatever errors  are
found.  The UTS utility dskfix performs this function.

The manual page filsys(3) explains in detail the format of a file  system
(disk).   Dskfix   incorporates   the  programs   icheck(3),   dcheck(3),
ncheck(3), clri(3), rm(1), mount(3) and umount(3) to perform the checking
and fixing functions.  For all but the most serious errors, the best pos-
sible recovery action is taken by dskfix.  There are some errors, such as
a seriously corrupted root disk, for which the recovery action  cannot be
determined except  by a  competent system  programmer.  In  these  cases,
dskfix gives up with a "fatal error" message.  When this happens, you can
either try fixing the error with clri(3),  mknod(3), etc., or you can  go
to a backup tape and restore the last good copy of the root disk that you
have.  Since the data  on the root  disk doesn't change  that often  (you
should have all your users on other disks), this should  not be a serious
loss.




10.   PANIC MESSAGES WITH EXPLANATIONS


sys/alloc.c             bad block

This message comes from the "badblock" routine in alloc.c.  This  routine
is called from  several locations to  check whether a  block number on  a
file system is within range.  If the block number is out of range of pos-
sible block numbers on a device, panic is called.  To  determine the ori-
ginal cause of the error  it is necessary to trace  back on the stack  to
find the routine which called "badblock" and then determine why that rou-
tine was using an invalid block number.

-------------------------------- Page 11 --------------------------------

sys/asm.s               bad call to a_mvcl

The a_mvcl routine checks for two  possible errors in its arguments:  (1)
the target  address must  not  be zero.   This is  to prevent  accidental
clearing of page zero.  (2) the from count and  the to count must be  the
same.  This makes the a_mvcl  code simpler, but the restriction could  be
removed if desired.  In either case, you must look at the stack to deter-
mine what went wrong  and where a_mvcl  was called, and determine why  it
was called with the wrong arguments.

sys/asm.s               bad call to copypag

The copypag routine checks  whether it is  being told to  copy a page  to
page zero.  If it is, it panics with this message, since page zero should
never be overwritten.  To find the error,  you must trace back the  stack
to find the  calling routine,  and then determine  why it called  copypag
with the wrong argument.

sys/asm.s               bad call to zeropag

The zeropag routine checks whether it is  being told to clear page  zero.
If it is, it  panics with this message,  since page zero should never  be
cleared.  To find the error,  you must trace back the  stack to find  the
calling routine, and then determine why it called zeropag with  the wrong
argument.

dev/bio.c               blkdev

The getblk routine  is used  by several higher  level routines  to get  a
block for i/o  to a block  device.  Getblk checks  the device number  for
validity, and if it  is out of  range, it panics  with this message.   To
find the error, you  must trace back the  stack to find the calling  rou-
tine, and then  determine why  it called  getblk with  an invalid  device
number.

sys/mem.c               copysgs -- zero to_pt

The copysgs routine is used  to copy groups of  pages.  The target  pages
must have already been allocated.  If they were not, copysgs  will find a
null pointer, and will panic with  this message.  This should never  hap-
pen.

dev/bio.c               devtab

In the course of allocating a block for block i/o, getblk must access the
device table in bdevsw.  If the pointer in bdevsw is  null, getblk panics
with this message.  To fix the error you should make sure there is a dev-
ice table pointer in the bdevsw table for the given device.

dev/dasd.c              disk: DE but channel program not done

-------------------------------- Page 12 --------------------------------

The disk driver, dasd.c, makes a number of consistency checks in the pro-
cessing of  channel programs.   One of  these is  to check  that the  csw
address points at the end of the channel program when device end is  sig-
nalled without a  unit check.  If  it does not,  dasd.c panics with  this
message.  Since this is a hardware error, there is nothing you can do  to
UTS to prevent it from happening again.

dev/dasd.c              disk: bad block number

The disk driver, dasd.c, checks to make sure the block number it is  try-
ing to read is within range.  If it is not,  dasd.c panics with this mes-
sage.  To fix the problem, trace back  to the routine requesting the  i/o
and determine why the wrong block number was passed.

dev/dasd.c              disk: error correction out of bounds

The disk driver, dasd.c,  applies error correction  data returned by  the
disk when  the hardware  detects a  software correctable  error.  If  the
error correction data is out of bounds, dasd.c panics with this  message.
Since this is a hardware  error, you cannot prevent a recurrence of  this
panic by changing UTS.

dev/dasd.c              disk: getnxtbp but dt->b_actf == 0

The disk driver, dasd.c, maintains a linked  list of blocks on which  i/o
has been requested but not  yet completed.  If a pointer in this list  is
null when  it shouldn't  be,  getnxtbp panics  with this  message.   This
should never happen.

dev/dasd.c              disk: getnxtbp but dt->b_actl == 0

The disk driver, dasd.c, maintains a linked  list of blocks on which  i/o
has been requested but not  yet completed.  If a pointer in this list  is
null when  it shouldn't  be,  getnxtbp panics  with this  message.   This
should never happen.

dev/dasd.c              disk: unexpected DE

The disk driver, dasd.c, has an internal flag which indicates whether i/o
is in progress to the disk.  If a device end interrupt happens while this
flag is zero, dasd.c panics with this message.  This should never happen.

dev/dasd.c              disk: writing on R/O disk

The disk driver, dasd.c, maintains an internal flag indicating whether  a
given disk is writable.  If a write is requested to  a non-writable disk,
dasd.c panics with this message.  Since this error should be caught at  a
higher level in the system, this panic should never happen.

dev/dasd.c              disk: zero data address

-------------------------------- Page 13 --------------------------------

The disk driver, dasd.c, checks the data address  where the i/o is to  be
done, and if it is  zero, panics with this message.  This is because  i/o
should never be done to page zero.  This should never happen.

sys/mem.c               freepag - not on page boundary

If the freepag routine is called to free a page that does not begin on  a
page boundary it panics with this message.  This should never happen.

dev/ioq.c               freeq - q freed twice

If the  freeq routine  is called  to free  an io  queue element  that  is
already free, it panics with this message.  This should never happen.

sys/main.c              iinit

The iinit routine, called at system initialization time, attempts to read
the super-block of the root disk.  If this read fails,  iinit panics with
this message.  The  probable cause  of the panic  is an  i/o error.   The
thing to do is to  attempt to re-IPL the system, and if the error  arises
again then the root  disk is probably  messed up and  should be  restored
from a backup tape.

dev/io2.c               iocc1 - csw indicates program check

The iocc1 routine, called when an io instruction gives condition code  1,
checks for a program  check indication in the  csw.  If it finds one,  it
panics with this message.  The solution is  to fix the driver which  gen-
erated the erroneous channel program.

dev/io2.c               iocc1 - csw indicates protection check

The iocc1 routine, called when an io instruction gives condition code  1,
checks for a protection check indication in the csw.  If it finds one, it
panics with this message.  The solution is  to fix the driver which  gen-
erated the erroneous channel program.

dev/ioq.c               ioenq - zero qnum

The ioenq  routine is  used to  place an  io queue  element on  a  queue.
Before doing this, ioenq checks  the validity of the q number.  If it  is
zero, ioenq panics with this message.  This should never happen.

dev/io2.c               iointr - csw indicates program check

The iointr routine, called to process an i/o interrupt, checks for a pro-
gram check indication in the  csw.  If it finds one, it panics with  this
message.  The solution is to fix the driver which generated the erroneous
channel program.

-------------------------------- Page 14 --------------------------------

dev/io2.c               iointr - csw indicates protection check

The iointr routine, called to process an i/o interrupt, checks for a pro-
tection check indication  in the csw.   If it finds  one, it panics  with
this message.  The  solution is  to fix  the driver  which generated  the
erroneous channel program.

dev/ioq.c               iotic - no device

The iotic routine, called by tic  to check for missing interrupts,  needs
to determine the appropriate  device if an  interrupt is missing.  If  no
device is found, iotic panics with this message.  This should never  hap-
pen, since the device structure  should have been located before the  i/o
instruction was issued.

dev/io1.c               no device

When a device driver calls  the sio, tio or hio  routine to issue an  i/o
instruction, the routine must  locate the appropriate device  structures.
If it cannot do so, it panics with this message.  This indicates an error
in either the device  driver, which gave  the wrong device number, or  in
iotab.c, which contains the structures for the various devices.

sys/alloc.c             no fs

The getfs routine is called to locate  the super-block of a mounted  file
system.  If  the search  fails,  getfs panics  with this  message.   This
should never happen.

sys/iget.c              no imt

The routine iget  is called to  get an  i-node given a  device and  inode
number.  Iget may need to find a mounted file system in this process.  If
it cannot find the mount entry, it panics with this message.  This should
never happen.

dev/ioq.c               no ioq

The getq  routine allocates  the  next free  i/o queue  element  for  the
caller.  If there are no free queue elements, getq panics  with this mes-
sage.  The solution is to  increase the number of  i/o queue elements  by
increasing the NIOQ constant in ioq.c.

sys/main.c              no memory for buffers

The binit routine, called at system initialization time, allocates memory
for block i/o  buffers.  If it runs  out of memory,  it panics with  this
message.  The solution is to either run in a larger virtual address space
or reduce the  number of  buffers, by reducing  NBUF constant defined  in
param.h

-------------------------------- Page 15 --------------------------------

sys/slp.c               no procs

During the fork process, the newproc routine is called to set up the  new
process.  If newproc cannot find  a slot in the process table, it  panics
with this message.  Since a  previous routine has  already checked for  a
free slot, this panic should never happen.

sys/sys1.c              process 1 died

If process 1 exits, the system panics with  this message.  The fix is  to
determine why process 1 died and make sure this condition cannot recur.

sys/slp.c               running a dead proc

The setrun routine is called to put a  process in the run queue.   Setrun
checks to make sure the proc is runnable; if it  isn't setrun panics with
this message.  This should not happen.

sys/mem.c               sbreak1 - no pages

The sbreak1 routine is called to increase or decrease the number of pages
allocated to a process.  If  it is called to free pages and no pages  are
currently allocated, it panics with  this message.  This should not  hap-
pen.

sys/mem.c               sbreak1 - too few pages

The sbreak1 routine is called to increase or decrease the number of pages
allocated to a process.  If it is called to free  pages and too few pages
are currently allocated, it  panics with this  message.  This should  not
happen.

dev/io2.c               sense failed

When io2.c detects a  unit check condition  on a device,  it responds  by
doing a sense  command to that  device.  If the  sense command fails,  it
panics with this message.  This should not happen.

dev/io1.c               sio: zero caw

If a device  driver calls  the sio routine  with a  zero channel  address
word, sio panics with this message.  This should not happen.

sys/slp.c               sleeping on wchan 0

The sleep routine is  called to give  up control of  the processor  while
awaiting an event.  The event is the number "wchan".  This  number should
never be zero;  if it  is sleep panics  with this  message.  This  should
never happen.

-------------------------------- Page 16 --------------------------------

sys/trap.c              svc

The svc routine, called to process an svc interrupt, checks to be sure it
was called in problem state.  If not, it panics with this message.  Since
the kernel does not issue svc's, and since  only the kernel ever runs  in
supervisor state, this panic should never happen.

sys/trap.c              trap

The pgm routine, called to process a program interrupt, checks to be sure
the interrupt occurred  in problem  state.  If not,  it panics with  this
message, after printing out  the program interrupt  type and the  general
registers.  Ideally, the kernel should not get program interrupts, unfor-
tunately it does happen.   There is  no general solution;  it depends  on
what program interrupt occurred and where.  The most frequent cause  is a
pointer going haywire.  The way to figure out what happened is to go back
through the stack and  find out what  routine caused the interrupt,  then
try to determine why.
