It’s been over three months since I posted anything to nerd1951.com, so here’s a quick summary of what I’ve been working on. I’ll post more detailed discussions of these subjects over the next few weeks.
The Linguistics Wars
I’ve been reading The Linguistics Wars by Randy Harris, an introduction to linguistics for the lay reader. It discusses the contentious disagreements among American theoretical linguists (primarily Noam Chomsky and his students) in the ‘60s and ‘70s, and through this discussion illustrates what linguists do. Since Generative Grammar was at the center of this controversy, The Linguistics Wars includes a deeper discussion of grammatical theories than Pinker’s The Language Instinct and I think it’s a good follow-up to that book.
The questions about Generative Grammar and theoretical linguistics from that era still have not been settled. It is impossible to prove or disprove a hypothesis without evidence, and it seems impossible to obtain evidence about the internal workings of the human brain. But cognitive scientists have discovered that the brain does reveal some of its secrets through tell-tale eye movements, and they have devised some clever experiments using this information to gather data about how the brain processes language. Technology is also contributing to the quest with the advent of the functional MRI. I’m working my way through a number of recent papers comparing linguistic theories in light of this new evidence.
Part-of-Speech tagger software
I continue to develop my Part-of-Speech tagger software. Categorizing words by their syntactic function is a more interesting problem than it first appears. The set of Part-of-Speech categories depends on your approach to syntax and grammar, so I’ve been working on a more flexible design for my tagger. I planned to use XML files for output in the initial design; XML is useful because it is human readable, yet it has enough structure to be easily processed by software. XML Style Sheets (XSLT) can be used to reformat to XML to HTML or PDF to make it even easier for people to read. I’ve decided to expand my use of XML to define as much of the Part-of-Speech tagger as possible using XML configuration files. This approach should provide more flexibility in determining the application’s behavior. Expect me to geek out on C++ programming for a couple of posts, where I’ll discuss several open source frameworks for processing XML in C++. I’ll also describe how I used C++ Templates and the Abstract Factory design pattern to quickly integrate different XML frameworks for evaluation.