首页> 外文会议> >Automatic Chinese unknown word extraction using small-corpus-based method
【24h】

Automatic Chinese unknown word extraction using small-corpus-based method

机译:基于小语料库的中文未知词自动提取

获取原文

摘要

Chinese unknown word extraction is an important problem for Chinese language processing. There are troublesome difficulties in the problem. First, almost any Chinese character can either represent a word or be a part of other words. Secondly, there is no blank between Chinese words for identifying the boundaries. Although some approaches have been proposed, there are some drawbacks in these methods. Here, we present and develop a method to extract Chinese unknown words more efficiently and precisely. It retains efficiency and accuracy even though the size of document set is small for training. It can also extract the unknown words occur rarely. Based on these advantages, it is very practical for real applications.
机译:中文未知词提取是中文处理的一个重要问题。这个问题有麻烦的困难。首先,几乎任何汉字都可以代表一个单词或成为其他单词的一部分。其次,中文单词之间没有空白来标识边界。尽管已经提出了一些方法,但是这些方法存在一些缺点。在这里,我们提出并开发了一种更有效,更准确地提取中文未知单词的方法。即使用于培训的文档集很小,它仍然可以保持效率和准确性。它还可以提取未知单词,很少出现。基于这些优点,对于实际应用非常实用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号