首页> 外文会议>Pacific Asia Conference on Language, Information and Computation >The Construction of a Dictionary for a Two-layer Chinese Morphological Analyzer
【24h】

The Construction of a Dictionary for a Two-layer Chinese Morphological Analyzer

机译:两层中文形态分析仪的施工

获取原文

摘要

We built a morphological analyzer, which can be freely used by anyone for research purpose. In order to build a pratical system, a dictionary with reasonable size is necessary. The initial dictionary is built from the Perm Chinese Treebank corpus v4.0 and contains only 33,438 entries. Since the initial dictionary is quite small, unknown word detection methods are applied to a huge raw text in order to extract new words to be added into the system dictionary. We have successfully constructed a dictionary with 120,769 entries. Finally, we propose a two-layer morphological analyzer to cater for two sets of outputs. The first layer produces the minimal segmentation units defined by us, and the second layer transforms the output of the first layer to the original segmentation units defined by Penn Chinese Treebank.
机译:我们建造了一种形态学分析仪,可以由任何人自由使用以进行研究目的。 为了构建实践系统,需要合理尺寸的字典。 初始词典是从彼此中文树班库v4.0构建的,只包含33,438个条目。 由于初始词典非常小,因此未知的单词检测方法应用于庞大的原始文本,以便提取要添加到系统词典中的新单词。 我们已成功构建了一个包含120,769个条目的字典。 最后,我们提出了一种双层形态学分析仪,以满足两组产出。 第一层产生由US定义的最小分割单元,第二层将第一层的输出转换为由Penn Chinese TreeBank定义的原始分段单元。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号