Hackers generally learn best by either taking things apart or by building things. So after reading an introduction to Parts Of Speech in Carnie’s Syntax, it’s time to build something, a toy Part Of Speech Tagger for English.
When we think of parts of speech we usually use semantic roles. Nouns are places, people, or things. Verbs are actions. But when studying Generative Grammar we assign parts of speech according to syntactic roles. Determining parts of speech can be a bit of a chicken and egg problem. On the one hand we need to know the parts of speech assigned each word in order to parse a sentence. On the other hand syntactic rules provide clues for identifying each word’s part of speech assignment. In fact, syntactic role is the final determiner for part of speech since many words take on different identities depending on their context.
Fortunately, there are other clues to a word’s part of speech. First is the structure of the word itself or its morphology. In linguistics, morphology is the study of how words are constructed from smaller components called morphemes. For example, words ending in -ment are nouns such as basement. Plural nouns take on the suffix -s or -es as in deaths and taxes.
Most words belong to what are called open classes. Open classes are large and easily take on new words. There are also the closed classes that are relatively small and rarely take on new words. The closed class word include: Determiner, Preposition, Conjunction, Tense Marker, Negation, Complement. We can assign a part of speech to these words using a direct match then use some basic syntactic rules to assign parts of speech to nearby words.
The Part Of Speech tagger will use a combination closed classes, morphological analysis, and syntactic rules as described in Carnie’s Syntax. An introductory chapter on word categories is by no means comprehensive. So a Part of Speech tagger based on this limited information will be a toy or, at best, a prototype but building this application will reinforce what I’ve learned and perhaps provide base that I can build on. Once I’ve completed the design and written the code I’ll publish them here.