.he'IDENTIFIER''Page %'
.fo'Max Clowes'- % -'October, 1977'.
.pg
The RECOGNIZER demo exhibits an approach to recognition - the template
matching approach - that featured in the very earliest thinking about
identifying and naming visual pattern. See for example Corcoran and papers in
Fischer et al e.g. Hannan`s paper. While this was quickly seen to be wholly
impracticable, escaping from it is hard. 'Features` and 'measures` seemed
a plausible alternative. 'Measures` is a term imported from Decision Theory
and refers to the evidence on which a decision - in this case a categorisation - is
based.
For example the height and width of the letter might help discriminate
between them. It wouldn't separate the four letters in SAMPLES but what
about the amount of ink, i.e. the 'density` of the pattern. PICTURES presents
a scenario for a SCAN function, how could you use it to measure
height, width, density? Sutherland was influenced by the neurophysiology of
the octopus` eye which seems to 'histogram` patterns offered to it. He believed
that quite a lot of observations about perception could be explained by assuming
that the two-dimensional pattern is turned into two one-dimensional
patterns - the 'projections` of the pattern onto the horizontal and
vertical axes.
.pg
Here are my ideas for modifying SCAN to deliver Sutherlands version
of "What the Octopus' eye tells the Octopus' brain" ;-
.pg
The SCAN function descriped in PICTURES is intended to employ 
RECORDPT to add to POINTLIST the locations of inked points encountered
in each pass of SCANROW. It is these points that we want to count, column-by-column.
Complete the design of SCAN and SCANROW and try out SCAN on one of the
RECOGNISER samples. (Remember to copy IMAGE into the Turtle PICTURE ....
IMAGE->PICTURE; before running SCAN(TESTHERE).)
 		POINTLIST =>
.br
will now exhibit the points we want to count. The function LENGTH applied
to POINTLIST should tell us how many points it contains
 		LENGTH(POINTLIST) =>
.br
but we want not a single number but a list of numbers one for each ROW. To do
that we shall need to restructure POINTLIST so that each ROW is represented
by a separate ROWLIST. That means minimally that each execution of SCANROW
(in the REPEAT YMAX TIMES loop of SCAN) must return a ROWLIST. Here is
a modified version of SCANROW and RECORDPT to accomplish that:
 		FUNCTION SCANROW(DIST,FUNC) => ROWLIST;
 			[]-> ROWLIST;
 			REPEAT DIST TIMES
 				FUNC();
 				JUMP(1);
 			CLOSE;
 		END;
.sp2
 		FUNCTION RECORDPT;
 			[%HERE()%]<>ROWLIST -> ROWLIST;
 		END;
.sp
Now try
 		SCANROW(10,TESTHERE) =>
.br
if that`s O.K. go on to
 		SCAN(TESTHERE)=>
.br
You should have a set of lists one for each row scanned that have accumulated
on the stack. To get them formed into a list assigned to POINTLIST we need
to modify scan:
 		FUNCTION SCAN(FUNC);
 		VARS ROWNUM; 1->ROWNUM;
 			STARTROW(ROWNUM);
 		VARS XMAX YMAX;
 		ELEMENT(2,FNPROPS(PICTURE))->XMAX;
 		ELEMENT(4,FNPROPS(PICTURE))->YMAX;
 		[%REPEAT YMAX TIMES
 			STARTROW(ROWNUM);
 			SCANROW(XMAX,FUNC);
 			1+ROWNUM->ROWNUM;
 			CLOSE;%]->POINTLIST;
 		END;
.br
The 'decorated list brackets` gather up anything put on the stack by
actions carried out 'between them` so to speak, and listify them.
.pg
Try SCAN(TESTHERE); POINTLIST => again. We can now experiment with
LENGTH. Try 
 		LENGTH(POINTLIST) => 	and
 		LENGTH(HD(POINTLIST)) =>
.be
We need a list of numbers one for each of the sublists of POINTLIST, that
list is XPROJN. And then we need to do the same for XPROJN.
.in-4
.pg
The two lists of numbers XPROJN, YPROJN that we get have 'maxima` and 'minima` in them
which should be characteristic of the letters. Try them on the SAMPLES.
If we assume that anything in XPROJN or YPROJN greater than 3 is a maximum write
a function to produce a list of the number of maxima in each projection
for the unknown letter.
.pg
For the sample E we would have:
 		[1 3]
.pg
Can you now add a further function to NAME the pattern on the basis of this
evidence? e.g.
.tp12
 		FUNCTION NAME(EVIDENCE) => CATEGORY;
 		VARS XMAX YMAX;
 			HD(EVIDENCE) -> XMAX;
 			HD(TL(EVIDENCE)) -> YMAX;
 			IF XMAX = 1 AND YMAX = 3 THEN
 				'C' -> CATEGORY;
 			ELSEIF	.....
 			ELSEIF	.....
 			CLOSE;
 		END;
.sp
How far does the idea of Sutherland's cope with the much-rehearsed defects of
template matching? This 'measures` view was the basis of Barrow & Popplestone's
approach to object recognition.
.pg
People got to be very ingenious with the measures game : Alt (in Fischer et al)
proposed to 'weigh` the pattern and calculate its 'moments of inertia` :
Kamentsky and Liu looked for arbitrary but 'informative`
(in the information-theoretic sense) 'n-tuples`. An 'n-tuple` is what you get
if you choose say half a dozen (n=6) picture locations at random and then toss
a coin to determine the colour - black or white (ink or space) - that you'd
like each location to be. The resultant 'template` is then tested all over
the unknown sample, and if it 'fits` anywhere, that measure is said to
be TRUE else FALSE. Using about 60 or more such n-tuples each having
a different geometry and black/white make up Kamentsky and Liu were able to
successfully recognise a complete alphabet upper and lower case. Their paper
is technically interesting because of the way that they really took the
technology of Information Theory to its limit in automating the design of the
n-tuples used in recognition.
.pg
'Features` as distinct from measures are things that you can see in the
pattern a concavity for example, a loop, a stroke (see Greanias et al). Many of these
were promplted by the then popular work of Hubel and Wiesel
on the neurophysiology of vision (see Uhr). I attempted to use their ideas in
a system that was intended to simulate the action of the visual cortex (Clowes 1966).
The features of line patterns that SEEPICTURE extracts are a good indicator of the sort
of things that these people had in mind but with the major difference of
course that SEEPICTURE assumes that lines are going to be 'thin` - ideally just
one element wide. (What happens if you give it 'thick` letters I wonder?)
The RECOGNISER demo takes you into the use of SEEPICTURE on the letter samples
there. The DATABASE demo will help you get at the features SEEPICTURE finds
in a sample The ELLs, TEEs, and ENDs of SEEPICTURE` perceptual apparatus
clearly(!) discriminate between the letters of the SAMPLES set. What about finding
out the numbers of each resulting from an application of SEEPICTURE to a sample.
The following function is based partly on my reading of the DATABASE demo:
.tp6
 		FUNCTION NUMBEROF(TYPE) => FOUND;
 			0->FOUND;
 			FOREACH[^TYPE ??P]	THEN FOUND+1->FOUND CLOSE;
 		END;
.br
except that you won't find an explanation there of '^TYPE`. The notation
derives from the MATCH function that is used by DATABASE; to retrieve
things from lists.
.pg
If the '^` prefix is omitted then reference to TYPE is understood to mean "look for
the string 'TYPE` in the database". That is if I try to execute
 		NUMBEROF("ELL") =>
.br
then it will ignore "ELL" and just look for "TYPE". In fact I want it to
look for the occurrence of "ELL" or whatever argument NUMBEROF has when I call
it, e.g. NUMBEROF("END") NUMBEROF("TEE") etc., and this meaning of TYPE is
signalled by ^TYPE. (There is an embryonic demo MATCH on the Matcher
function; and I have a bit on it at the end of ARITH2).
 	Try	NUMBEROF("ELL") =>
 		NUMBEROF("TEE") =>
.br
after you've run SEEPICTURE on a SAMPLE.
.pg
We can push recognition into the same mould that NAME assumed using NUMBEROF
to deliver a list which contains, say, three numbers so that E would have
 	[2 1 3]		ELLs, TEEs and ENDs
.br
Of course we know that having 2 ELLs a TEE and three ENDS isn't really an
adequate decription of a TEE. Its a lot better (but in what sense) that X and
Y projections say, but we can construct lots of patterns with that
numerical combination of features that don't look a bit like an E. We could
strengthen it by insisting on 'East ENDs` say, and that their LINES be the same
length and the same length as the COLUMN of the TEE. But the most
subtle improvement is the interconnection of these features - two of the
ENDS go with the ELLS, and the other with stem of the TEE whose cross pieces
are the other arms of the two ELLs. The data structure delivered by
SEEPICTURE contains all that connection information. See if you can follow
the description just offered of an E (the passage above beginning "two of the
ENDs" and ending "of the two ELLs"). How could we implement that idea as a
framework for naming? The DATABASE demo describes a number of functions that
seem relevant to that idea. Suppose we started our definition of an E with the
TEE feature.
 		VARS STRUCTURE;
 		PRESENT([TEE ?? STRUCTURE]) =>
.br
will tell us if there`s a TEE present and assign to STRUCTURE the
DATABASE list that begins with TEE. We now want to take the relevant pieces of
that list and use them to address the other parts of the database where they
occur. How could one do that? What would a definition of an E look like in the language
of DATABASE? The paper by Evans (1969) discusses just such an approach to
recognition, also implicit in the very early work of Grimsdale
et al.
.pg
Another important problem that we touch on in the RECOGNISER demo is that of multiple
letters. How do we split up an image into its separate letters in advance
of recognising those letters? Perhaps we could implement a very general
definition of a letter as being all the connected lines : so we'd use something
like our putative approach to the definition of an E to first find and collect
together all the database statements that have 'overlapping` components.
.pg
Of course this wouldn't deal with touching letters or with letters with gaps
in them. But it would be a start to the segmentation problem.
I can think of two ways of dealing with those objections : (1) we should
do some 'preprocessing` to remove gaps (see PROCESSOR), (2) we need to devise
rules based on shape to decide on what pattern lines 'belong` together.
Guzman`s classic work on aggregating regions (see AI.VAC.R and POLYSEE) would seem a
good paradigm,
if only because every region in the picture of a scene touches
others ; only the bodi-ness of the regions can tell us which belong
together.
.pg
The idea that we should treat letter images as a scene is present in Marill`s paper
although that approach seems to make no use of the data structure idea - an
omission I try to criticise in Clowes 1969. A different sense of seeing-letters-as-scenes
is also explored in my paper and especially in the Appendix by Knowles.
It is the idea that seeing a pattern of ink as a stroke or a free end
or a junction is an 
INTERPRETATION. It is this fact that enables us to accept so many different
styles of letter as equivalent. Not just the geometry of a stroke can
vary enormously but its portrayal as a 'line` as a fat region, as a region
portrayed by a boundary line, even portrayed as a three dimensional 'solid`
form! SEEPICTURE is in effect assuming that a 'translation` has been effected
into an idealised form. Some of the PROCESSOR work (e.g. Blum`s transforms)
can be run as attempts to effect such a translation.
.pg
Finally what has all this letter recognition stuff got to do with reading? With the
puzzle of Johnson-Abercrombie`s 'Paris in the the Spring`? How should
we re-configure these recognition processes to read well?
