首页> 外文会议>Asia Pacific Web and Web-Age Information Management >New Word Detection in Ancient Chinese Literature
【24h】

New Word Detection in Ancient Chinese Literature

机译:中国古代文学新词检测

获取原文

摘要

Mining Ancient Chinese corpus is not as convenient as modern Chinese, because there is no complete dictionary of ancient Chinese words which leads to the bad performance of tokenizers. So finding new words in ancient Chinese texts is significant. In this paper, the Apriori algorithm is improved and used to produce candidate character sequences. And a long short-term memory (LSTM) neural network is used to identify the boundaries of the word. Furthermore, we design word confidence feature to measure the confidence score of new words. The experimental results demonstrate that the improved Apriori-like algorithm can greatly improve the recall rate of valid candidate character sequences, and the average accuracy of our method on new word detection raise to 89.7%.
机译:矿业古代汉语语料库并不像现代中文那么方便,因为没有完整的古代汉语词汇词典,导致令牌的糟糕表现。所以在古代中文文本中找到新的单词是重要的。在本文中,改进了APRiori算法并用于产生候选字符序列。和长期内存(LSTM)神经网络用于识别单词的边界。此外,我们设计了单词信心功能,以衡量新词的信心。实验结果表明,改进的Apriori样算法可以大大提高有效候选字符序列的召回率,以及我们对新词检测方法的平均准确性升高到89.7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号