.he `PARSE``Page %`
.fo `Steven Hardy`- % -`November 1977`
.ce2
Parsing Natural Language
------------------------
.sp2
This handout sketches out a simple 'top down` parser for a subset of
English. The demo is based, in part, on the SILLYSENT handout and is
intended to help the reader understand Terry Winograd`s
SHRDLU system (and also the SHRDLU demo).
.sp
Parsing is briefly discussed in the STUDENT demo.
.bb
SILLYSENT is a program embodying a grammar which generates random
sentences. Wherever the grammar offers a choice to the sentence
generater,
SILLYSENT picks at random. Obviously, we don`t do this in normal conversation;
the choices we make are constrained by what we want the final sentence to mean.
.sp
It seems reasonable, then, that when we hear a sentence we ought to discover the
sequence of syntactic choices it represents as these ought to help us
discover the 'meaning` of the sentence.
.sp
One way to study this 'parsing` would be to write a program like
SILLYSENT, except that the new program should use a finished sentence as a
guide, rather than make random choices.
For example:
 	: [THE BIG MAN LOVED A HAPPY GIRL] -> INPUT;
 	: SENTENCE() =>
 	** [[THE [BIG MAN]] [LOVED [A [HAPPY GIRL]]]]
.bb
The parser described below has the following properties:
.sp
.in8
.ti-3
1) Each syntactic class, like NP and QNOUN, has a separate parsing function asociated with it.
.ti-3
2) These functions 'read` an instance of the appropriate class from the INPUT
and return a suitable bracketted representation of that instance
.ti-3
3) If they are unable to do this (perhaps because the INPUT is syntactically
incorrect) the parsing functions are to leave the INPUT untouched and return
FALSE.
.in0
.bb
We will find it useful to have a function to READ a word from the INPUT:
 	: FUNCTION READ() => RESULT;
 	:	IF INPUT = [] THEN
 	:		FALSE -> RESULT
 	:	ELSE
 	:		HD(INPUT) -> RESULT;
 	:		TL(INPUT)-> INPUT
 	:	CLOSE
 	: END
.sp
.tp 14
We can now use this function to parse, say, an adjective:
 	: FUNCTION ADJ() => RESULT;
 	:	VARS SAVED, TEMP;
 	:	INPUT -> SAVED;
 	:	FALSE -> RESULT
 	:	READ() -> TEMP
 	:	IF MEMBER(TEMP, [BIG HAPPY ...]) THEN
 	:		TEMP -> RESULT
 	:	CLOSE;
 	:	UNLESS RESULT THEN
 	:		SAVED -> INPUT
 	:	CLOSE
 	: END;
.sp
This must look horrific - need the function really be so complicated?
The straight forward answer is 'no` because I`ve made the function
fit a general pattern thus:
 	: FUNCTION NAME() => RESULT;
 	:	<Save current input in case things go wrong>
 	:	<assume parse will be unsuccessful>
 	:	<read some componenents>
 	:	<if all okay set up successful result>
 	:	<but if failed then restore input>
 	: END;
.bb
Using these conventions we could also write a program
to parse a nounphrase:
 	: FUNCTION NP() => RESULT;
 	:	VARS SAVED, T1,T2;
 	:	INPUT -> SAVED;
 	:	ARTICLE() -> T1;
 	:	QNOUN() -> T2;
 	:	IF T1 AND T2 THEN
 	:		[%T1, T2%] -> RESULT
 	:	CLOSE;
 	:	UNLESS RESULT THEN
 	:		SAVED -> INPUT
 	:	CLOSE;
 	: END;
.sp
Notice that this function wastefully tries to parse a QNOUN even if it
failed to parse an ARTICLE.
.bb
.tp 23
The QNOUN function is a little tricky to write since
there are two alternatives for a qualified noun:
.sp
 	: FUNCTION QNOUN() => RESULT;
 	:	VARS SAVED, T1,T2;
 	:	INPUT -> SAVED;
 	:	NOUN() -> T1;
 	:	IF T1 THEN
 	:		T1 -> RESULT
 	:	ELSE
 	:		SAVED -> INPUT;
 	:		ADJ() -> T1;
 	:		IF T1 THEN
 	:			QNOUN() -> T2;
 	:			IF T2 THEN
 	:				[%T1, T2%] -> RESULT
 	:			CLOSE
 	:		CLOSE
 	:	CLOSE;
 	:	UNLESS  RESULT THEN
 	:		SAVED -> INPUT
 	:	CLOSE
 	: END;
.sp
.sp
This function tries its first alternative (a noun);
if that succeeds all is well otherwise QNOUN reset the input (unnecessarily in this case) and tries its second alternative.
.bb
I don't really like the above programs -
they are far too long and complicated.
Perhaps we could simplify them a bit by putting some of the common
code to a separate function, thus:
 	: FUNCTION PARSE(TYPE) => RESULT;
 	:	VARS SAVED;
 	:	INPUT -> SAVED;
 	:	FALSE -> RESULT;
 	:	TYPE();
 	:	UNLESS RESULT THEN
 	:		SAVED -> INPUT
 	:	CLOSE
 	: END;
.tp 14
We can use this function to rewrite our qualified noun recognizer:
 	: FUNCTION QNOUN();
 	:	VARS T1, T2;
 	:	PARSE(NOUN)-> RESULT;
 	:	IF RESULT THEN EXIT;
 	:	SAVED -> INPUT;
 	:	PARSE(ADJ) -> T1;
 	:	UNLESS T1 THEN EXIT;
 	:	PARSE(QNOUN) -> T2;
 	:	UNLESS T2 THEN EXIT;
 	:	[%T1,T2%] -> RESULT;
 	: END
.tp 10
Here is the function to parse nounphrases:
 	: FUNCTION NP();
 	:	VARS T1, T2;
 	:	PARSE(ARTICLE) -> T1;
 	:	UNLESS T1 THEN EXIT;
 	:	PARSE(QNOUN) -> T2;
 	:	UNLESS T2 THEN EXIT;
 	:	[%T1, T2%] -> RESULT
 	: END
Notice that this version of NP is 'cleverer' than before - it 'gives up'
straightaway if it doesn't find an ARTICLE.
.bb
By now you ought to have an idea of how to complete a 'bracketer`
for simple English sentences. However, a good parser can do more than
this one task.
.sp
In the SILLYSENT demo, it is suggested that words might have 'features`
to prevent silly sentences like
.sp
 	: THE TINY ANT LIFTED THE
 	:	BIG RED SMALL LORRY
.sp
I don`t have too many ideas how to do this but here`s a few guesses.
.sp
We ought to give the parsing functions a list of expected
features and expect back a list of actual features. Some check on these
not being inconsistent should be made, for example the word ANT
could have features:
 	: [[TYPE NOUN] [CLASS ANIMATE] [STRENGTH LOW]]
.br
LORRY might be
 	: [[TYPE NOUN] [CLASS INANIMATE] [WEIGHT HIGH]]
.sp
The verb LIFT might have associated with it some advice to the
effect that the STRENGTH of its subject ought to 'match`
the WEIGHT of its object.
.sp
If this is too hard we ought, at last, to expect the various adjectives
in a noun phrase to not have contradictory features. For example
knowing that BLUE has property [COLOUR BLUE] and
RED has [COLOUR RED] ought to stop our parser accepting THE RED
BLUE BALL.
.bb
This discussion of parsing has completely omitted any mention of what
one ought to do with ungrammatical input.
This is obviously a problem since, in practise, much of what we hear is
ungrammatical. I don`t know how to deal with this problem.
