首页> 外文会议>Pacific Asia Conference on Language, Information and Computation; 20061101-03; Wuhan(CN) >The Construction of a Dictionary for a Two-layer Chinese Morphological Analyzer
【24h】

The Construction of a Dictionary for a Two-layer Chinese Morphological Analyzer

机译:两层中文形态分析仪字典的构建

获取原文
获取原文并翻译 | 示例

摘要

We built a morphological analyzer, which can be freely used by anyone for research purpose. In order to build a pratical system, a dictionary with reasonable size is necessary. The initial dictionary is built from the Perm Chinese Treebank corpus v4.0 and contains only 33,438 entries. Since the initial dictionary is quite small, unknown word detection methods are applied to a huge raw text in order to extract new words to be added into the system dictionary. We have successfully constructed a dictionary with 120,769 entries. Finally, we propose a two-layer morphological analyzer to cater for two sets of outputs. The first layer produces the minimal segmentation units defined by us, and the second layer transforms the output of the first layer to the original segmentation units defined by Penn Chinese Treebank.
机译:我们构建了一个形态分析仪,任何人都可以免费使用它进行研究。为了构建实用的系统,必须使用大小合理的字典。初始词典是从彼尔姆中文树库语料库v4.0构建的,仅包含33,438个条目。由于初始词典非常小,因此未知单词检测方法将应用于巨大的原始文本,以提取要添加到系统词典中的新单词。我们已经成功构建了包含120,769个条目的字典。最后,我们提出了一个两层的形态分析仪,以适应两组输出。第一层产生我们定义的最小分割单元,第二层将第一层的输出转换为Penn Chinese Treebank定义的原始分割单元。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号