Affinity Propagation Initialisation Based Proximity Clustering For Labeling in Natural Language Based Big Data Systems

机译：基于亲缘传播初始化的基于邻近度聚类的自然语言大数据系统中的标签

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A key challenge for natural language based large text data is automatically extracting knowledge, in terms of entities and relations, embedded in it. State of the art relation extraction systems requires large amounts of labeled data, which is costly and very difficult, especially in industrial settings, due to time constraints of subject matter experts. Techniques like distant supervision require the availability of a related knowledge base, which is rarely possible. We have developed a novel model for automatically clustering textual Big Data, based on techniques inspired from Active Learning and Clustering, that can derive powerful insights and make the data ready for machine learning with minimal manual effort. Our approach differs from Active Learning as we operate under weak supervision, where all the instances provided for training are not manually labeled. Secondly, This differs from any prevailing clustering algorithms as we adopt a whole new approach of proximity clustering based on affinity propagation. Due to the extrapolation of the labeling efforts, our model makes it easier to adopt deep learning approaches with minimal manual effort. In this paper, we describe our algorithm in detail, along with the experimental results obtained for them.

机译：基于自然语言的大文本数据的一个主要挑战是自动提取嵌入其中的实体和关系方面的知识。现有技术的关系提取系统需要大量的标记数据，由于主题专家的时间限制，这是昂贵且非常困难的，尤其是在工业环境中。诸如远程监管之类的技术需要相关知识库的可用性，而这几乎是不可能的。我们基于主动学习和聚类的技术，开发了一种用于自动对文本大数据进行自动聚类的新颖模型，该模型可以得出强大的见解，并以最少的人工就可以为机器学习准备数据。我们的方法不同于主动学习，因为我们在薄弱的监督下进行操作，在这种情况下，提供给培训的所有实例均未手动标记。其次，这与任何流行的聚类算法都不同，因为我们采用了一种基于亲和力传播的接近聚类的全新方法。由于标注工作的外推，我们的模型使采用最少的人工工作就能更轻松地采用深度学习方法。在本文中，我们将详细描述我们的算法，以及为此获得的实验结果。

著录项

来源
《IEEE Intl Conference on Big Data Security on Cloud;IEEE Intl Conference on High Performance and Smart Computing;IEEE Intl Conference on Intelligent Data and Security》|2020年|1-7|共7页
会议地点
作者
Adithya Bandi; Karuna Joshi; Varish Mulwad;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
relation extraction; unsupervised labeling; clustering; affinity propagation; Labeling; Natural Language Processing; Big Data;

机译：关系提取;无监督标记;聚类;相似性传播;标签;自然语言处理;大数据;

相似文献

外文文献
中文文献
专利

1. Multivariate Time Series Data Clustering Method Based on Dynamic Time Warping and Affinity Propagation [J] . Xiaoji Wan, Hailin Li, Liping Zhang, Wireless communications & mobile computing . 2021,第a期

机译：基于动态时间翘曲和关联传播的多变量时间序列数据聚类方法
2. Affinity propagation clustering algorithm based on large-scale data-set [J] . Limin Wang, Kaiyue Zheng, Xing Tao, International Journal of Computers & Applications . 2018,第3期

机译：基于大规模数据集的相似性传播聚类算法
3. Deployment Strategy for Car-Sharing Depots by Clustering Urban Traffic Big Data Based on Affinity Propagation [J] . Liu Zhihan, Jia Yi, Zhu Xiaolu Scientific programming . 2018,第PTa1期

机译：基于亲和力传播的城市交通大数据聚类共享汽车库的部署策略
4. Affinity Propagation Initialisation Based Proximity Clustering For Labeling in Natural Language Based Big Data Systems [C] . Adithya Bandi, Karuna Joshi, Varish Mulwad IEEE Intl Conference on Big Data Security on Cloud;IEEE Intl Conference on High Performance and Smart Computing;IEEE Intl Conference on Intelligent Data and Security . 2020

机译：基于Authinity传播的基于自然语言的大数据系统标记的基于接近聚类的基于初始聚类
5. Affinity Propagation Initialisation Based Proximity Clustering for Labeling [D] . Bandi, Adithya. 2020

机译：基于关联传播初始化的初始聚类标签
6. An Adaptive Weighted KNN Positioning Method Based on Omnidirectional Fingerprint Database and Twice Affinity Propagation Clustering [O] . Jingxue Bi, Yunjia Wang, Xin Li, 2018

机译：基于全向指纹数据库和两次亲和传播聚类的自适应加权KNN定位方法
7. APPLICATION OF DATA MINING-BASED AFFINITY PROPAGATION CLUSTERING ALGORITHM FOR DIAGNOSIS OF MECHANICAL EQUIPMENT TRANSMISSION SYSTEM [O] . 2020

机译：基于数据挖掘的亲和力传播聚类算法在机械设备传输系统诊断中的应用

Affinity Propagation Initialisation Based Proximity Clustering For Labeling in Natural Language Based Big Data Systems

摘要

著录项

相似文献

相关主题

期刊订阅