首页> 外文会议>Conference of Open Innovations Association >Automated extraction of concept matcher thesaurus from semi-structured catalogue-like sources of data on the web

【24h】

Automated extraction of concept matcher thesaurus from semi-structured catalogue-like sources of data on the web

机译：自动提取概念匹配者词库从网上半结构化目录的数据源

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Ontology design and the process of populating a data-set with knowledge following the chosen or developed ontology to fit the principles of Semantic Web and Linked Open Data is a time-consuming and iterative process, requiring either expert knowledge or a set of tools for data scraping from web. A valid and consistent ontology and knowledge withing the data-set require unification of concepts which means overcoming ambiguity and synonymy of terms which become individuals of ontology. In this paper we spot on techniques used for organising a Russian food product data-set under a light-weight FOOD Ontology and concept matching in particular. Main approaches to data-set concept unification, synonymic term matching and ways to collect dictionaries for matcher are mentioned. The tool for catalogue-like semi-structured resources parsing and thesaurus extraction is developed and introduced for the task of on-the-fly concept matching.

机译：本体设计与填充所选或开发的本体的知识数据集的过程，以满足语义Web和链接的开放数据的原理，是一个耗时和迭代的过程，需要专家知识或一组数据工具从网上刮。具有数据集的有效和一致的本体和知识需要统一的概念，这意味着克服了成为本体个人的术语的歧义和同义词。在本文中，我们发现了用于在轻量级食品本体和概念中组织俄罗斯食品产品数据集的技术。提到了数据集概念统一，同义词匹配和收集匹配词典的同义词术语匹配和方法的主要方法。开发并介绍了类似于飞行概念匹配的任务的用于目录的半结构化资源解析和叙述提取的工具。

著录项

来源
《Conference of Open Innovations Association》|2016年|644p|共8页
会议地点
作者
Maxim Lapaev;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP309-53;
关键词
Data mining; Thesauri; Manuals; Ontologies; Databases; Semantic Web;

机译：数据挖掘;叙词;手册;本体;数据库;语义网络;
入库时间 2022-08-20 23:10:19

相似文献

外文文献
中文文献
专利

1. L-wrappers: concepts, properties and construction - A declarative approach to data extraction from web sources [J] . Badica C, Badica A, Popescu E, Soft computing: A fusion of foundations, methodologies and applications . 2007,第8期

机译：L包装器：概念，属性和构造-一种从Web来源提取数据的声明性方法
2. Automating Data Mart Construction from Semi-structured Data Sources [J] . Scriney Michael, McCarthy Suzanne, McCarren Andrew, The Computer journal . 2019,第3期

机译：从半结构化数据源自动化数据集市构建
3. LaSEWeb: Automating Search Strategies over Semi-structured Web Data [J] . Oleksandr Polozov, Sumit Gulwani SIGKDD explorations . 2014,第CDaROM期

机译：LaSEWeb：自动化半结构化Web数据的搜索策略
4. Automated extraction of concept matcher thesaurus from semi-structured catalogue-like sources of data on the web [C] . Maxim Lapaev Proceedings of the 18th Conference of Open Innovations Association FRUCT and Seminar on Information Security and Protection of Information Technology . 2016

机译：从网络上的半结构化类目录数据源中自动提取概念匹配词库
5. Entity information extraction using structured and semi-structured resources. [D] . Sil, Avirup. 2014

机译：使用结构化和半结构化资源提取实体信息。
6. Using machine learning for concept extraction on clinical documents from multiple data sources [O] . Manabu Torii, Kavishwar Wagholikar, Hongfang Liu 2011

机译：使用机器学习从多个数据源提取临床文档的概念
7. WEIDJ: Development of a new algorithm for semi-structured web data extraction [O] . Ily Amalina Ahmad Sabri, Mustafa Man 2021

机译：Weidj：开发新型网络数据提取的新算法

Automated extraction of concept matcher thesaurus from semi-structured catalogue-like sources of data on the web

摘要

著录项

相似文献

相关主题

期刊订阅