Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web

Matthew Michelson; Craig A. Knoblock

首页> 外文期刊>International Journal on Document Analysis and Recognition >Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web

【24h】

Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web

机译：从万维网上的非结构化，非语法化数据源中无监督地提取信息

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Information extraction from unstructured, ungrammatical data such as classified listings is difficult because traditional structural and grammatical extraction methods do not apply. Previous work has exploited reference sets to aid such extraction, but it did so using supervised machine learning. In this paper, we present an unsupervised approach that both selects the relevant reference set(s) automatically and then uses it for unsupervised extraction. We validate our approach with experimental results that show our unsupervised extraction is competitive with supervised machine learning approaches, including the previous supervised approach that exploits reference sets.

机译：由于不应用传统的结构和语法提取方法，因此很难从非结构化，非语法数据（例如分类列表）中提取信息。先前的工作已经利用参考集来辅助这种提取，但是它是使用监督机器学习来完成的。在本文中，我们提出了一种无监督方法，该方法会自动选择相关的参考集，然后将其用于无监督提取。我们通过实验结果验证了我们的方法，该结果表明我们的无监督抽取与有监督的机器学习方法（包括以前利用参考集的有监督的方法）相比具有竞争优势。

著录项

来源
《International Journal on Document Analysis and Recognition》 |2007年第4期|p.211-226|共16页
作者
Matthew Michelson; Craig A. Knoblock;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. WEB-SCALE INFORMATION EXTRACTION FROM UNSTRUCTURED AND UNGRAMMATICAL DATA SOURCES [J] . MADHAVIK. SARJARE, S. L. VAIKOLE International Journal of Computer Science Engineering and Information Technology Research . 2014,第2期

机译：从非结构化和非语法数据源中提取Web规模信息
2. Creating Relational Data from Unstructured and Ungrammatical Data Sources [J] . Knoblock C. A., Michelson M. The Journal of Artificial Intelligence Research . 2008,第12期

机译：从非结构化和非语法化数据源创建关系数据
3. Creating Relational Data from Unstructured and Ungrammatical Data Sources [J] . M. Michelson, C. A. Knoblock Journal of Automation, Mobile Robotics & Intelligent Systems . 2008,第1期

机译：从非结构化和非语法化数据源创建关系数据
4. Phoebus: A System for Extracting and Integrating Data from Unstructured and Ungrammatical Sources [C] . Mtthew Michelson, Craig A. Knoblock National Conference on Artificial Intelligence(AAAI-06);Innovative Applications of Artificial Intelligence Conference(IAAI-06) . 2006

机译：Phoebus：一种用于从非结构化和非语法源提取和集成数据的系统
5. A reference-set approach to information extraction from unstructured, ungrammatical data sources. [D] . Michelson, Matthew. 2009

机译：从非结构化，非语法数据源中提取信息的参考集方法。
6. EagleEye: A Worldwide Disease-Related Topic Extraction System Using a Deep Learning Based Ranking Algorithm and Internet-Sourced Data [O] . Beakcheol Jang, Myeonghwi Kim, Inhwan Kim, 2021

机译：EAGLEEYE：使用基于深度学习的排名算法和互联网源数据的全球疾病相关主题提取系统
7. Unsupervised information extraction from unstructured, ungrammatical data sources on the world wide web [O] . Craig A. Knoblock 2007

机译：从万维网上的非结构化，不合逻辑的数据源中提取无监督信息
8. Automated Extraction and Characterisation of Social Network Data from Unstructured Sources -- An Ontology-Based Approach. [R] . Martineau, E., Lecocq, R. 2013

机译：非结构化源社交网络数据的自动提取与表征 - 基于本体论的方法。

Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web

摘要

著录项

相似文献

相关主题

期刊订阅