首页> 外文会议>International Conference on Information Retrieval and Knowledge Management >Information extraction from semi-structured and un-structured documents using probabilistic context free grammar inference

【24h】

Information extraction from semi-structured and un-structured documents using probabilistic context free grammar inference

机译：使用概率背景自由语法推断从半结构化和未结构化文件中提取信息

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Large number of research papers are available in the form of un-structured (text) format. Knowledge discovery in un-structured document has been recognized as promising task. These documents are typically formatted for human viewing, which varies widely from document to document. Frequent change in their formatting causes difficulties in constructing a global schema. Thus, discovery of interesting rules from it is a complex and tedious process. Recently, conditional random fields (CRFs) and hand-coded wrappers have been used to label the text (such as Title, Author Name(s), Affiliation, Email, Contact number, etc. in research papers). In this paper we propose a novel hybrid approach to infer grammar rules using alignment similarity and probabilistic context free grammar. It helps in extracting desired information from the document.

机译：大量的研究论文以未结构化（文本）格式的形式提供。未结构化文件中的知识发现已被认为是有前途的任务。这些文档通常是用于人类观察的格式化，这些文件从文档中广泛变化。他们的格式频繁变化会导致构建全局模式的困难。因此，从它那里发现有趣的规则是一个复杂和繁琐的过程。最近，有条件的随机字段（CRF）和手工编码包装器已被用于标记文本（例如在研究论文中标记文本（例如标题，作者名称，电子邮件，联系号码等）。在本文中，我们提出了一种新的混合方法来使用对齐相似性和概率背景自由语法推断语法规则。它有助于从文档中提取所需信息。

著录项

来源
《International Conference on Information Retrieval and Knowledge Management 》|2012年||共4页
会议地点
作者
Thakur Ramesh; Jain Suresh; Chaudhari Narendra S.; Singhai Rahul;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 G354-53;
关键词

相似文献

外文文献
中文文献
专利

1. Learning Probabilistic Hierarchical Task Networks as Probabilistic Context-Free Grammars to Capture User Preferences [J] . NAN LI, WILLIAM CUSHING, SUBBARAO KAMBHAMPATI, ACM transactions on intelligent systems . 2014 ,第2期

机译：将概率分层任务网络学习为概率上下文无关文法，以捕获用户首选项
2. Learning to Extract Information from Semi-structured Text using a Discriminative Context Free Grammar [J] . Paul Viola, Mukund Narasimhan ACM SIGIR FORUM . 2005 ,第Spe期

机译：学习使用判别性上下文无关文法从半结构化文本中提取信息
3. Structure detection and segmentation of documents using 2D stochastic context-free grammars [J] . Alvaro Francisco, Cruz Francisco, Sanchez Joan-Andreu, Neurocomputing . 2015 ,第feba20ptaa期

机译：使用2D随机上下文无关文法对文档进行结构检测和分割
4. Information extraction from semi-structured and un-structured documents using probabilistic context free grammar inference [C] . Thakur Ramesh, Jain Suresh, Chaudhari Narendra S., Information Retrieval amp; Knowledge Management (CAMP), 2012 International Conference on . 2012

机译：使用概率上下文无关文法推理从半结构化和非结构化文档中提取信息
5. Preference Grammars and Decoding Algorithms for Probabilistic Synchronous Context Free Grammar Based Translation. [D] . Venugopal, Ashish. 2009

机译：基于概率同步上下文免费语法的翻译的首选语法和解码算法。
6. Searching for universal model of amyloid signaling motifs using probabilistic context-free grammars [O] . Witold Dyrka, Marlena Gąsior-Głogowska, Monika Szefczyk, 2021

机译：使用概率无背景语法寻找淀粉样蛋白信令基序的通用模型
7. Information Extraction from the Un-Structured Document using Grammatical Inference and Alignment Similarity [O] . Thakur Ramesh, Jain Suresh, Chaudhari Narendra S., 2012

机译：使用语法推断和对齐相似性从非结构化文档中提取信息
8. Inside/Outside Algorithm: Grammatical Inference Applied to Stochastic Context-Free Grammars [R] . Dodd, L. 1988

机译：内/外算法：语法推理应用于随机无上下文语法

Information extraction from semi-structured and un-structured documents using probabilistic context free grammar inference

摘要

著录项

相似文献

相关主题

期刊订阅