Sentences alignment for Chinese parallel corpus is studied in the paper .The parallel corpora are the original text ( classical Chinese) and its modern text translation ( modern text ) of Shiji ( Records of the Grand Historian ) written by SiMa Qian in the period of Western Han Dynasty .The log-linear model combines the length feature and sentence alignment mode feature of the sentence with the co -occurrence of Chinese words feature , in this way to align the sentences of the classical Chinese and the modern text of Shiji .Through the experiment it can be demonstrate that the precision and recall rate of sentence alignment reach the highest at 94.4%and 94.3%respectively when taking into account these three features at the same time .%对西汉时期司马迁所著《史记》原文(古文)与现代文译文(现代文)的平行语料进行句子对齐研究。对数线性模型将句子的长度特征、句子对齐模式特征和共现汉字特征相结合来对《史记》古文和现代文进行句子对齐。通过实验可以看出,同时考虑句子长度、句子对齐模式和共现汉字三个特征,句子对齐的准确率和召回率是最高的,准确率为94.4%,召回率为94.3%。
展开▼