首页> 外文会议>International Computer Engineering Conference >Ontology Learning Based on Word Embeddings for Text Big Data Extraction
【24h】

Ontology Learning Based on Word Embeddings for Text Big Data Extraction

机译:基于词嵌入的本体学习用于文本大数据提取

获取原文

摘要

Big Data term describes data that exists everywhere in humongous volumes, raw forms, and heterogenous types. Unstructured and uncategorized data forms 95% of big data. Text big data lacks to efficiently extract domain-relevant data in a suitable time. Thus, text big data stills a barrier for big data integration and subsequently big data analytics. Because big data integration can't consider text big data in its process of preparing data for big data analytics. On the other side, ontology represents information and knowledge in a graph schema that provides a shareable, reusing and domain-specific data. Thus, ontology fits text big data needs of extracting domain relevant data. So, this paper proposes an ontology learning (OL) methodology for text big data extraction. OL aims to provides algorithms, techniques, and tools for automatic ontology construction from the text. The proposed OL method exploits a deep learning approach i.e., word embeddings, and advanced hierarchical clustering i.e., BIRCH. The utilization of the word embeddings and the advanced hierarchical clustering improve OL quality in text big data extraction and reduce the processing time. Also, deep learning unsupervisory learns from a massive amount of unlabeled and uncategorized raw data. This great big benefit solves analytical challenge of the text big data. In evaluation, precision, recall, and f - value for the work quality and the running time for performance are measured. The quality of work is evaluated by comparing its results with gold standard datasets results. Experimental results and evaluation demonstrate that the proposed OL methodology efficiently suitable for text big data extraction.
机译:大数据一词描述了以无数卷,原始格式和异构类型无处不在的数据。非结构化和未分类的数据构成大数据的95%。文本大数据缺乏在合适的时间内有效提取与领域相关的数据的能力。因此,文本大数据仍然是大数据集成以及随后的大数据分析的障碍。因为大数据集成在为大数据分析准备数据的过程中不能考虑文本大数据。另一方面,本体以图模式表示信息和知识,该图模式提供可共享的,可重用的和特定于域的数据。因此,本体满足了提取领域相关数据的文本大数据需求。因此,本文提出了一种用于文本大数据提取的本体学习(OL)方法。 OL的目的是提供用于从文本自动构建本体的算法,技术和工具。提出的OL方法利用了深度学习方法(即单词嵌入)和高级层次聚类(即BIRCH)。利用单词嵌入和先进的层次聚类可以提高文本大数据提取中的OL质量,并减少处理时间。此外,深度学习无监督可从大量未标记和未分类的原始数据中学习。巨大的好处解决了文本大数据的分析挑战。在评估中,将测量工作质量的精度,召回率和f值,以及性能的运行时间。通过将其结果与黄金标准数据集结果进行比较来评估工作质量。实验结果和评估表明,提出的OL方法有效地适合于文本大数据提取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号