Natural Language Generation systems usually require substantial knowledge about the structure of the target language in order to perform the final task in the generation process - the mapping from semantic representation to text known as surface realisation. Designing knowledge bases of this kind, typically represented as sets of grammar rules, may however become a costly, labour-intensive enterprise. In this work we take a statistical approach to surface realisation in which no linguistic knowledge is hard-coded, but rather trained automatically from large corpora. Results of a small experiment in the generation of referring expressions show significant levels of similarity between our (computer-generated) text and those produced by humans, besides the usual benefits commonly associated with statistical NLP such as low development costs, domain- and language-independency.
展开▼