Unsupervised Named Entity Normalization for Supporting Information Fusion for Big Bridge Data Analytics

机译：无监督的命名实体归一化，用于支持大桥数据分析的信息融合

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The large amount of multi-type and multi-source bridge data open unprecedented opportunities to big data analytics for better bridge deterioration prediction. Information fusion is needed prior to the analytics to transform the heterogeneous data from different sources into a unified representation. Resolving the ambiguities in the named entities extracted from bridge inspection reports is one of the most important fusion tasks. The ambiguity stems from the use of different and ambiguous surface forms to the same target named entity. There is, thus, a need for named entity normalization (NEN) methods that can map these ambiguous surface forms into their canonical form - an identifier concept. However, existing NEN methods are limited in this regard. This is because they mostly require pre-established knowledge (e.g., dictionaries or Wikipedia) and/or training data, and mostly ignore the impact of the normalization on data analytics. To address this need, this paper proposes an unsupervised NEN method. It includes two main components: candidate identifier concept generation based on multi-grams of each named entity set, and candidate identifier concept ranking based on a proposed ranking function. The function uses the TF-IDF (term frequency-inverse document frequency) weight and is further improved by considering the impacts of gram lengths and positions on the ranking. It aims to balance the abstractness and detailedness of the identifier concepts, so as to ensure that the resulting data are neither too dense nor too sparse for the analytics. A set of experiments were conducted to evaluate the performance of the proposed method. It achieved an accuracy of 84.5%.

机译：大量多型和多源网桥数据对大数据分析开放了前所未有的机会，以便更好的桥梁劣化预测。在分析之前需要信息融合，以将异构数据从不同来源转换为统一的表示。解决从桥接检查报告中提取的命名实体中的含糊之处是最重要的融合任务之一。歧义源于使用不同和模糊的表面形式的不同目标。因此，需要对可以将这些模糊的表面形成为其规范形式的命名实体归一化（NEN）方法 - 标识符概念。然而，在这方面存在现有的NEN方法是有限的。这是因为它们主要需要预先建立的知识（例如，词典或维基百科）和/或培训数据，并且大多数忽略了对数据分析的标准化的影响。为了解决这种需求，本文提出了一种无人监督的NEN方法。它包括两个主要组成部分：基于每个命名实体集的多克的候选标识符概念生成，以及基于所提出的排名函数的候选标识符概念概念排序。该功能使用TF-IDF（术语频率逆文档频率）重量，并且通过考虑克长度和位置对排名的影响进一步提高。它旨在平衡标识符概念的抽象和详细性，从而确保所产生的数据既不太密集也不太稀疏，因为分析。进行了一组实验以评估所提出的方法的性能。它达到了84.5％的准确性。

著录项

来源
《Workshop of the European Group for Intelligent Computing in Engineering》|2018年|488p|共20页
会议地点
作者
Kaijian Liu; Nora El-Gohary;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP30-53;
关键词
Named entity normalization; Big data analytics Bridge deterioration prediction;

机译：命名实体归一化;大数据分析桥梁劣化预测;

相似文献

外文文献
中文文献
专利

1. Efficient Extraction of Named Entities from New Domains Using Big Data Analytics [J] . C. Janarish Saju, S. Ravimaran Journal of computational and theoretical nanoscience . 2018,第2期

机译：使用大数据分析从新域中提取命名实体的提取
2. Scalable biomedical Named Entity Recognition: investigation of a database-supported SVM approach. [J] . Habib MS, Kalita J International journal of bioinformatics research and applications . 2010,第2期

机译：可扩展的生物医学命名实体识别：研究数据库支持的SVM方法。
3. Fusion Analytics: A Data Integration System for Public Health and Medical Disaster Response Decision Support [J] . Dina B. Passman Online Journal of Public Health Informatics . 2013,第1期

机译：Fusion Analytics：用于公共卫生和医疗灾难响应决策支持的数据集成系统
4. Unsupervised Named Entity Normalization for Supporting Information Fusion for Big Bridge Data Analytics [C] . Kaijian Liu, Nora El-Gohary Workshop of European Group for Intelligent Computing in Engineering;International conference on advanced computing and applications . 2018

机译：支持大桥梁数据分析的信息融合的无监督命名实体规范化
5. Unsupervised Biomedical Named Entity Recognition [D] . Ghiasvand, Omid. 2017

机译：无监督的生物医学命名实体识别
6. Fusion Analytics: A Data Integration System for Public Health and Medical Disaster Response Decision Support [O] . Dina B. Passman 2013

机译：Fusion Analytics：用于公共卫生和医疗灾难响应决策支持的数据集成系统
7. Scalable Biomedical Named Entity Recognition: Investigation of a Database-Supported SVM Approach [O] . Mona Soliman Habib, Jugal Kalita 2013

机译：可扩展的生物医学命名实体识别：数据库支持的sVm方法的研究

Unsupervised Named Entity Normalization for Supporting Information Fusion for Big Bridge Data Analytics

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅