首页> 外文会议>IEEE International Conference on Signal, Information and Data Processing >Optimized Clustering based on Semantic Similarity of Components for Short Text

【24h】

Optimized Clustering based on Semantic Similarity of Components for Short Text

机译：基于组件语义相似度的短文本优化聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Short text is usually made up of several words and one sentence at most. Considering sparse features and complicated expressions, the similarity measurement and the clustering of the texts may not work well. The semantic clustering for short text is studied. Firstly, according to the analysis of the dependency syntax structure, the event component and modified components are extracted. Then, the semantic similarity of the texts is analyzed with the component as the basic unit. The strategy is that the text would be similar when not only the event component but also the modified components are similar. Further, considering the proposed semantic similarity may lead to increased topics, which means various cluster shape, indefinite cluster number, and increased noise point for the cluster, the density peak clustering is selected, and a regression parameter is designed to improve the cluster number and the noise. Based on the public data set, the proposed semantic clustering is tested: purity $P$ is 96% and $F$ measure is 71.97%. The proposed method has been used in the electrical power industry and is worth promoting.

机译：短文本通常最多由几个单词和一个句子组成。考虑到稀疏的特征和复杂的表达式，相似度的度量和文本的聚类可能无法很好地工作。研究了短文本的语义聚类。首先，通过对依存句法结构的分析，提取事件成分和修饰成分。然后，以该组件为基本单元来分析文本的语义相似性。策略是，当不仅事件组件而且修改的组件相似时，文本也将相似。此外，考虑到拟议的语义相似性可能导致主题增加，这意味着各种簇形状，不确定的簇数以及该簇的噪声点增加，因此选择了密度峰聚类，并设计了一个回归参数来提高聚类数和噪音。基于公共数据集，对提出的语义聚类进行了测试：纯度 $ P $ 是96％， $ F $ 测量值为71.97％。所提出的方法已经在电力工业中使用，值得推广。

著录项

来源
《IEEE International Conference on Signal, Information and Data Processing 》|2019年|1-6|共6页
会议地点
作者
Wensong Liu; Feng Lin; Zhuqing Hu; Jinhui Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
short text; event analysis; semantic similarity of component; density peak clustering optimization;

机译：短文本;事件分析;组件语义相似度;密度峰值聚类优化;

相似文献

外文文献
中文文献
专利

1. Measuring the short text similarity based on semantic and syntactic information [J] . Jiaqi Yang, Yongjun Li, Congjie Gao, Future generation computer systems . 2021 ,第Jana期

机译：基于语义和句法信息测量短文本相似性
2. A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings [J] . Karlo Babi?, Francesco Guerra, Sanda Martin?i?-Ip?i?, Journal of Information and Organizational Sciences . 2020 ,第2期

机译：基于Word Embeddings测量短文本语义相似性的方法的比较
3. Wikipedia-Based Semantic Similarity Measurements for Noisy Short Texts Using Extended Naive Bayes [J] . Shirakawa Masumi, Nakayama Kotaro, Hara Takahiro, Emerging Topics in Computing, IEEE Transactions on . 2015 ,第2期

机译：使用扩展的朴素贝叶斯，基于维基百科的嘈杂短文本语义相似性度量
4. Optimization research of short text semantic clustering based on the social media [C] . Ping Zhang, Jianzhong Wang 2017 IEEE 3rd Information Technology and Mechatronics Engineering Conference . 2017

机译：基于社交媒体的短文本语义聚类优化研究
5. Short-Text Semantic Similarity: Algorithms and Applications. [D] . Sultan, Md Arafat. 2016

机译：短文本语义相似性：算法和应用。
6. Detection of medical text semantic similarity based on convolutional neural network [O] . Tao Zheng, Yimei Gao, Fei Wang, 2019

机译：基于卷积神经网络的医学文本语义相似度检测
7. Similarity Calculation Method of Chinese Short Text Based on Semantic Feature Space [O] . Liqiang Pan, Pu Zhang, Anping Xiong 2015

机译：基于语义特征空间的中文短文相似度计算方法

Optimized Clustering based on Semantic Similarity of Components for Short Text

摘要

著录项

相似文献

相关主题

期刊订阅