首页> 外文会议>Chinese Automation Congress >A Multi-granulariry Chinese Word Segmentation Method Based on Semantic Similarity for Risk Sources

【24h】

A Multi-granulariry Chinese Word Segmentation Method Based on Semantic Similarity for Risk Sources

机译：基于风险源语义相似性的多种粒度汉语分割方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a multi-granularity Chinese word segmentation method based on semantic similarity for risk sources. Based on expert knowledge, a risk source root dictionary and a risk source extension dictionary were established, and the risk source extension dictionary was trained using the LDA model. Firstly, the sentence containing the root vocabulary was divided into two sentences using root dictionary. Two semantic segment Word next to the root vocabulary were combined into two new extension vocabulary, which were calculated semantic similarity by Latent Dirichlet Allocation model together with the root vocabulary. Finally, the word with the largest similarity was used as the extension risk word. Experiment show that the average success rate of our method reached 94.19%, and can be used for text segmentation of criminal cases.

机译：本文介绍了基于风险源语义相似性的多粒度中文词组分割方法。基于专业知识，建立了风险源根词典和风险源扩展词典，并且使用LDA模型训练风险源扩展名称。首先，使用root字典分为两个句子的句子分为两个句子。将根词汇旁边的两个语义段单词组合成两个新的延伸词汇表，其通过潜在的Dirichlet分配模型与根词汇一起计算语义相似性。最后，使用具有最大相似性的单词作为扩展风险词。实验表明，我们的方法的平均成功率达到了94.19％，可用于刑事案件的文本细分。

著录项

来源
《Chinese Automation Congress》|2019年|1 v.|共3页
会议地点
作者
Zhi Li; Jiaqiang Wang; Bei Zhang; Yanfang Zhang; Xueling Ji; Yu Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术及设备;
关键词
multi-granulariry; Chinese word segmentation; Semantic Similarity; latent dirichlet allocation; risk source;

机译：多种粒度;中文词分割;语义相似;潜在的Dirichlet分配;风险源;

相似文献

外文文献
中文文献
专利

1. Chinese-English Bilingual Word Semantic Similarity Based on Chinese WordNet [J] . Yangyang Wu, Siying Wu, Duansheng Chen Journal of software . 2015,第1期

机译：基于中文WordNet的汉英双语词语义相似度
2. Applications of corpus-based semantic similarity and word segmentation to database schema matching [J] . Aminul Islam, Diana Inkpen, Iluju Kiringa The VLDB journal . 2008,第5期

机译：基于语料库的语义相似度和分词在数据库模式匹配中的应用
3. Chinese WeChat and Blog Hot Words Detection Method Based on Chinese Semantic Clustering [J] . Wang Yu, Song Sixin, Zhou Fanfan, Intelligent automation and soft computing . 2017,第4期

机译：基于中文语义聚类的中文微信和博客热门词检测方法
4. A Multi-granulariry Chinese Word Segmentation Method Based on Semantic Similarity for Risk Sources [C] . Zhi Li, Jiaqiang Wang, Bei Zhang, Chinese Automation Congress . 2019

机译：基于语义相似度的风险源多粒度中文分词方法
5. Image Segmentation: Structural Similarity, Belief Propagation and Radial Basis Functions for Level Set Based Methods. [D] . Zhu, Yingxuan. 2010

机译：图像分割：基于水平集的方法的结构相似性，信念传播和径向基函数。
6. Similarity of fMRI Activity Patterns in Left Perirhinal Cortex Reflects Semantic Similarity between Words [O] . Rose Bruffaerts, Patrick Dupont, Ronald Peeters, 2013

机译：左周围皮层功能磁共振成像活动模式的相似性反映了单词之间的语义相似性
7. Applications of Corpus-based Semantic Similarity and Word Segmentation to Database Schema Matching [O] . 2014

机译：基于语料库的语义相似度和分词在数据库模式匹配中的应用

A Multi-granulariry Chinese Word Segmentation Method Based on Semantic Similarity for Risk Sources

摘要

著录项

相似文献

相关主题

期刊订阅