Online papers here.
My primary line of research involves the building and study of psychocomputational
models of first language acquisition. In other words, I use
computational techniques to model the process by which children learn the
grammar of their native (first) language.
Most of these models consist of three core components:
the linguistic framework (e.g. the grammar formalisms in play)
the linguistic environment (e.g. the sentences encountered by the
language learner )
the algorithm which the learner employs to achieve the final (correct)
One key research question is:
Given a framework
and an algorithm, what properties of the linguistic environment are most/least
conducive to efficient learning?
An important focus is the effect of cross-language ambiguity on learning
efficiency. Many sentence forms (for example Subject Verb Object) occur
across many languages. It is unclear how children, given the set of ambiguous
forms they are exposed to, are so efficiently able to determine the correct
grammar that generates their native language. (A second, equally
important target of investigation, just underway, involves within-language
ambiguity, e.g. She gave the dog biscuits, and its effect on learning
I have demonstrated that for a variety of proposed learning algorithms
(within Chomsky's principles and parameters framework) there is a narrow
range of linguistic conditions that support efficient learning and that
these conditions are quite different for different algorithms. This leads
to the (perhaps not so surprising) conclusion that the success of any
computer model of human language acquisition must be measured against
the match between the abstract computational/linguistic environment the
model 'lives in' and the actual facts that delineate true human grammars.
Very recently, I have begun a secondary line of research which involves
the use of machine learning techniques to garner information from corpora
of natural language discourse (e.g. email and chats) in order to detect
keywords that can be used to link sections of the discourse to relevant