首页> 外文期刊>Concurrency and computation: practice and experience >Mining the web to discover acronym-definitions based on sequence labeling and iterative query expansionmodel
【24h】

Mining the web to discover acronym-definitions based on sequence labeling and iterative query expansionmodel

机译:挖掘网站以了解基于序列标记和迭代查询扩展模型的首字母缩略词定义

获取原文
获取原文并翻译 | 示例
           

摘要

Finding definitions associated with an acronym is a constructive task to many applications like web search, information retrieval, natural language processing, ontology mapping, and question answering. A large-scale manually built acronym-definition repositories are available online but updating is a difficult task. To address this problem, previous research works used either heuristics or a machine learning approach to automate the detection of acronym-definition pairs. This article presents a heuristics approach based on a rule-based sequence-labeling model, for finding the list of definitions associated with an acronym from the Web. The organic search result of the web page includes website title, URL, Meta description, and site links. In the proposed work, web pages of the search engine are used as corpus and sequence-labeling task is done through character and word level mapping schemes. In these schemes, the desirable properties of the definition are expressed by using rules. Each identified definition is validated through a collocation measure. Besides, the obtained results are assessed against a manually built acronym/definitions repository: Acronym Finder. The proposed model certainly improves the coverage of manual repositories maintaining high precision and recall.
机译:查找与首字母缩略词关联的定义是对Web搜索,信息检索,自然语言处理,本体映射和问题应答等许多应用程序的建设性任务。在线提供大规模手动构建的首字母缩略词定义存储库,但更新是一项艰巨的任务。为了解决这个问题,以前的研究工作使用了启发式或机器学习方法来自动检测缩略词定义对。本文介绍了一种基于基于规则的序列标记模型的启发式方法,用于查找与来自Web的缩写相关联的定义列表。网页的有机搜索结果包括网站标题,URL,元描述和站点链接。在所提出的工作中,搜索引擎的网页用作语料库和序列标记任务是通过字符和字级映射方案完成的。在这些方案中,通过使用规则来表示定义的理想属性。通过搭配测量验证每个识别的定义。此外,获得的结果是针对手动建立的缩写/定义存储库评估:首字母缩略词查找器。拟议的模型肯定改善了维持高精度和召回的手动存储库的覆盖范围。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号