首页> 外文会议>Chinese Automation Congress >A Multi-granulariry Chinese Word Segmentation Method Based on Semantic Similarity for Risk Sources
【24h】

A Multi-granulariry Chinese Word Segmentation Method Based on Semantic Similarity for Risk Sources

机译:基于风险源语义相似性的多种粒度汉语分割方法

获取原文

摘要

This paper presents a multi-granularity Chinese word segmentation method based on semantic similarity for risk sources. Based on expert knowledge, a risk source root dictionary and a risk source extension dictionary were established, and the risk source extension dictionary was trained using the LDA model. Firstly, the sentence containing the root vocabulary was divided into two sentences using root dictionary. Two semantic segment Word next to the root vocabulary were combined into two new extension vocabulary, which were calculated semantic similarity by Latent Dirichlet Allocation model together with the root vocabulary. Finally, the word with the largest similarity was used as the extension risk word. Experiment show that the average success rate of our method reached 94.19%, and can be used for text segmentation of criminal cases.
机译:本文介绍了基于风险源语义相似性的多粒度中文词组分割方法。基于专业知识,建立了风险源根词典和风险源扩展词典,并且使用LDA模型训练风险源扩展名称。首先,使用root字典分为两个句子的句子分为两个句子。将根词汇旁边的两个语义段单词组合成两个新的延伸词汇表,其通过潜在的Dirichlet分配模型与根词汇一起计算语义相似性。最后,使用具有最大相似性的单词作为扩展风险词。实验表明,我们的方法的平均成功率达到了94.19%,可用于刑事案件的文本细分。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号