首页> 外文会议>IEEE Intl Conference on Big Data Security on Cloud;IEEE Intl Conference on High Performance and Smart Computing;IEEE Intl Conference on Intelligent Data and Security >Affinity Propagation Initialisation Based Proximity Clustering For Labeling in Natural Language Based Big Data Systems
【24h】

Affinity Propagation Initialisation Based Proximity Clustering For Labeling in Natural Language Based Big Data Systems

机译:基于亲缘传播初始化的基于邻近度聚类的自然语言大数据系统中的标签

获取原文

摘要

A key challenge for natural language based large text data is automatically extracting knowledge, in terms of entities and relations, embedded in it. State of the art relation extraction systems requires large amounts of labeled data, which is costly and very difficult, especially in industrial settings, due to time constraints of subject matter experts. Techniques like distant supervision require the availability of a related knowledge base, which is rarely possible. We have developed a novel model for automatically clustering textual Big Data, based on techniques inspired from Active Learning and Clustering, that can derive powerful insights and make the data ready for machine learning with minimal manual effort. Our approach differs from Active Learning as we operate under weak supervision, where all the instances provided for training are not manually labeled. Secondly, This differs from any prevailing clustering algorithms as we adopt a whole new approach of proximity clustering based on affinity propagation. Due to the extrapolation of the labeling efforts, our model makes it easier to adopt deep learning approaches with minimal manual effort. In this paper, we describe our algorithm in detail, along with the experimental results obtained for them.
机译:基于自然语言的大文本数据的一个主要挑战是自动提取嵌入其中的实体和关系方面的知识。现有技术的关系提取系统需要大量的标记数据,由于主题专家的时间限制,这是昂贵且非常困难的,尤其是在工业环境中。诸如远程监管之类的技术需要相关知识库的可用性,而这几乎是不可能的。我们基于主动学习和聚类的技术,开发了一种用于自动对文本大数据进行自动聚类的新颖模型,该模型可以得出强大的见解,并以最少的人工就可以为机器学习准备数据。我们的方法不同于主动学习,因为我们在薄弱的监督下进行操作,在这种情况下,提供给培训的所有实例均未手动标记。其次,这与任何流行的聚类算法都不同,因为我们采用了一种基于亲和力传播的接近聚类的全新方法。由于标注工作的外推,我们的模型使采用最少的人工工作就能更轻松地采用深度学习方法。在本文中,我们将详细描述我们的算法,以及为此获得的实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号