首页> 外文会议>IEEE International Conference on Signal, Information and Data Processing >Optimized Clustering based on Semantic Similarity of Components for Short Text
【24h】

Optimized Clustering based on Semantic Similarity of Components for Short Text

机译:基于组件语义相似度的短文本优化聚类

获取原文

摘要

Short text is usually made up of several words and one sentence at most. Considering sparse features and complicated expressions, the similarity measurement and the clustering of the texts may not work well. The semantic clustering for short text is studied. Firstly, according to the analysis of the dependency syntax structure, the event component and modified components are extracted. Then, the semantic similarity of the texts is analyzed with the component as the basic unit. The strategy is that the text would be similar when not only the event component but also the modified components are similar. Further, considering the proposed semantic similarity may lead to increased topics, which means various cluster shape, indefinite cluster number, and increased noise point for the cluster, the density peak clustering is selected, and a regression parameter is designed to improve the cluster number and the noise. Based on the public data set, the proposed semantic clustering is tested: purity $P$ is 96% and $F$ measure is 71.97%. The proposed method has been used in the electrical power industry and is worth promoting.
机译:短文本通常最多由几个单词和一个句子组成。考虑到稀疏的特征和复杂的表达式,相似度的度量和文本的聚类可能无法很好地工作。研究了短文本的语义聚类。首先,通过对依存句法结构的分析,提取事件成分和修饰成分。然后,以该组件为基本单元来分析文本的语义相似性。策略是,当不仅事件组件而且修改的组件相似时,文本也将相似。此外,考虑到拟议的语义相似性可能导致主题增加,这意味着各种簇形状,不确定的簇数以及该簇的噪声点增加,因此选择了密度峰聚类,并设计了一个回归参数来提高聚类数和噪音。基于公共数据集,对提出的语义聚类进行了测试:纯度 $ P $ 是96%, $ F $ 测量值为71.97%。所提出的方法已经在电力工业中使用,值得推广。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号