Search, Structure or Statistics?

A Comparative Study of Memoryless Heuristics for Syntax Acquisition

 

William Gregory Sakas (sakas@hunter.cuny.edu)

Department of Computer Science, Hunter College, CUNY

Ph.D. Programs in Linguistics and Computer Science, The Graduate Center, CUNY

695 Park Avenue, New York, NY 10065 USA

 

Eiji Nishimoto (enishimoto@gc.cuny.edu)

Ph.D. Program in Linguistics, The Graduate Center, CUNY

365 Fifth Avenue, New York, NY 10016 USA

 

 

Abstract

 

Although several studies propose computational models of the process by which children acquire the syntax of their native language, most focus on a single algorithm applied in a single domain. Typically, the focus is learnability – under what conditions an algorithm can or cannot acquire the grammar of the target (native) language. Here, we present a comparative study of 12 algorithmic heuristics that are run in a domain that consists of 16 abstract languages each generated by a different grammar specified in Chomsky’s principles and parameters framework. The heuristics consist of both those used in previously established models and new variations that we introduce. In contrast to a learnability study, our focus is feasibility – how much time and/or effort is required to achieve the target grammar. We find that the best heuristics make use of structural information obtained by parsing input sentences during the course of learning, that the performance of statistically-based heuristics are next in line, and finally, that heuristics that make use of hill-climbing search and a simple can-parse/can’t-parse outcome from the parsing mechanism perform least well.