该文将概率统计的二元模型与三元模型用于汉语词性自动标注,在算法为线性阶的时间复杂度的情况下,对20万训练集和1万的测试集,分别进行封闲测试和开放测试,对稀疏矩阵零元素及词性标注的结果做了统计分析。%In this paper,the statistic-based bi-grams and tri-grams were used in Chinese part-of-speech tagging. An algorithm which has a time complexity of O (n) was trained on a close corpus of 200,000 characters and then tested on an open test set of 10,000 characters. Finally,the sparse matrix zeros element and the tagging results were statistically analyzed.
展开▼