Dr.  William  Gregory Sakas - RESEARCH

sakas@hunter.cuny.edu        Home page .
Associate Professor
Department of Computer Science, Hunter College
Ph.D. Programs in Computer Science and Linguistics, The Graduate Center
City University of New York (CUNY)
 

 





Online papers here.

My primary line of research involves the building and study of psychocomputational     models of first language acquisition. In other words, I use computational techniques to model the process by which children learn the grammar of  their  native (first) language.

Most of these models consist of three core components:

  • the linguistic framework (e.g. the grammar formalisms in play)
  • the linguistic environment (e.g. the sentences encountered by the language learner )
  • the algorithm which the learner employs to achieve the final (correct) grammar
  • One key research question is: 
    Given a framework and an algorithm, what properties of the linguistic environment are most/least conducive to efficient learning?

    An important focus is the effect of cross-language ambiguity on learning efficiency. Many sentence forms (for example Subject Verb Object) occur across many languages. It is unclear how children, given the set of ambiguous forms they are exposed to, are so efficiently able to determine the correct grammar that generates their  native language. (A second, equally important target of investigation, just underway, involves within-language ambiguity, e.g. She gave the dog biscuits, and its effect on learning efficiency.)

    I have demonstrated that for a variety of proposed learning algorithms (within Chomsky's principles and parameters framework) there is a narrow range of linguistic conditions that support efficient learning and that these conditions are quite different for different algorithms. This leads to the (perhaps not so surprising) conclusion that the success of any computer model of human language acquisition must be measured against the match between the abstract computational/linguistic environment the model 'lives in' and the actual facts that delineate true human grammars.

    Very recently, I have begun a secondary line of research which involves the use of machine learning techniques to garner information from corpora of natural language discourse (e.g. email and chats) in order to detect keywords that can be used to link sections of the discourse to relevant online documents.