Stanford parser based approach for extraction of Link- Context from non-descriptive Anchor-Text

机译：基于STANFORD解析器的提取方法从非描述性锚文本中提取链接 - 上下文

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Link Context Analysis has been widely explored for determining the context of the target web page. But most of the researchers have only considered descriptive or meaningful anchor text and left the undiscriptive anchor text. By researching the World Wide Web it is analyzed that a good percentage of web pages can be reached by following the undescriptive anchor text. So an algorithm has been proposed and implemented for Link context determination (LCD) to determine the context of non-descriptive anchor text in this paper. In this work non-descriptive anchor text are mainly considered for Link Context determination. A corpus of different web pages belonging to a common domain has been considered first. Then the pages were manually analyzed and relation between the anchor text and the words in its vicinity were discovered. Certain numbers of rules were formed and represented in the form of a tree, based upon these relationships. In our proposed and implemented architecture for LCD we have used three components(1) Stanford parser (2) Rules (3) Link Context Determination. The input sentence is given to the Stanford parser which creates a parse tree for the read sentence. This tree is then used by the link context determiner along with the appropriate rules tree to determine the link context. The proposed approach has been implemented and validated by considering limited samples of non-descriptive ATs. The results have shown that, the proposed LCD has extracted 100% actual link-context of each considered non-descriptive Anchor Text (AT's).

机译：链接上下文分析已被广泛探索用于确定目标网页的上下文。但是，大多数研究人员只考虑了描述性或有意义的锚文本，并留下了未识别的锚文本。通过研究全球网络，分析了通过遵循未使用的锚文本可以达到良好的网页百分比。因此，已经提出并实现了用于链接上下文确定（LCD）来确定本文中的非描述性锚文本的上下文的算法。在此工作中，非描述性锚文本主要被认为是链接上下文确定。首先考虑属于公共域的不同网页的语料库。然后，在手动分析页面并在锚文本与其附近的单词之间的关系。基于这些关系，形成了某些规则并以树的形式表示。在我们提出和实施的LCD架构中，我们使用了三个组件（1）斯坦福解析器（2）规则（3）链接上下文确定。输入句子给斯坦福解析器，它为读取句创建一个解析树。然后，链接上下文确定器使用该树以及适当的规则树来确定链接上下文。通过考虑有限的非描述性ATS样本，已经实施和验证了所提出的方法。结果表明，所提出的LCD已经提取了每次考虑非描述性锚文本（AT）的100％实际链接 - 上下文。

著录项

来源
《International Conference on Reliability, Infocom Technologies and Optimization》|2014年||共6页
会议地点
作者
Kumar Narendra; Singh Monika;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Web sites; grammars; natural language processing; semantic Web; text analysis; LCD; Stanford parser based approach; Web page; World Wide Web; link context analysis; link context determination; link context extraction; nondescriptive anchor text; undiscriptive anchor text; Databases; Indium phosphide; Knowledge discovery; Organizations; Focused-Crawling; Information Extraction; Link Context Determiner (LCD); NLP; Semantic web crawling; Semantic-Web; Stanford Parser;

机译：网站;语法;语言处理;语义网络;文本分析;基于STANFORD解析器的方法;网页;万维网;链接上下文分析;链接上下文提取;不良锚文本;不良锚文本;数据库;磷化铟;知识发现;组织;聚焦爬行;信息提取;链接上下文确定器（LCD）;NLP;语义Web爬行;语义 - 网络;斯坦福王牌;

相似文献

外文文献
中文文献
专利

1. Extraction of complex index terms in non-English IR: A shallow parsing based approach [J] . Jesus Vilares, Miguel A. Alonso, Manuel Vilares Information Processing & Management . 2008,第4期

机译：非英语IR中复杂索引项的提取：一种基于浅层分析的方法
2. A pattern-based approach to detect and improve non-descriptive test names [J] . Jianwei Wu, James Clause The Journal of Systems and Software . 2020,第Octa期

机译：一种基于模式的方法来检测和改进非描述性测试名称
3. Parametric and nonparametric context models: A unified approach to scene parsing [J] . Aliniya Parvaneh, Razzaghi Parvin Pattern Recognition: The Journal of the Pattern Recognition Society . 2018,第期

机译：参数和非参数上下文模型：场景解析的统一方法
4. Stanford parser based approach for extraction of Link- Context from non-descriptive Anchor-Text [C] . Kumar Narendra, Singh Monika nternational Conference on Reliability, Infocom Technologies and Optimization . 2014

机译：基于Stanford解析器的方法，用于从非描述性锚文本中提取链接上下文
5. Using a named entity tagger and a syntactic parser to improve Web-based answer extraction [D] . Kamel, Yasser. 2004

机译：使用命名实体标记器和语法解析器来改进基于Web的答案提取
6. Applying Semantic-based Probabilistic Context-Free Grammar to Medical Language Processing – A Preliminary Study on Parsing Medication Sentences [O] . Hua Xu, Samir AbdelRahman, Yanxin Lu, -1

机译：将基于语义的概率无容论语法应用于医学语言处理 - 解析药物判决的初步研究
7. Deeper: A full parsing based approach to protein relation extraction [O] . Timur Fayruzov, Martine De Cock, Chris Cornelis, 2013

机译：更深入：基于完全解析的蛋白质关系提取方法
8. Table-Driven Approach to Fast Context-Free Parsing. [R] . Kipps, J. R. 1988

机译：快速无上下文解析的表驱动方法。

Stanford parser based approach for extraction of Link- Context from non-descriptive Anchor-Text

摘要

著录项

相似文献

相关主题

期刊订阅