...
首页> 外文期刊>DESIDOC Journal of Library Information Technology >Rule based Text Extraction from a Bibliographic Database
【24h】

Rule based Text Extraction from a Bibliographic Database

机译:从书目数据库的基于规则的文本提取

获取原文
           

摘要

The emergent concept of ‘ Big Data’ has shifted the paradigm from information retrieval to information extraction techniques. The information extraction techniques enables corpus analysis to draw useful interpretations and its possible applications. Selection of appropriate information extraction technique depends upon the type of data being dealt with and its possible applications. In an Ramp;D environment, the published information is considered as an authenticated benchmark to study and analyse the growth pattern in that field of science, medicine, business. A rule based information extraction process, on the selected data extracted from a bibliographic database of published Ramp;D papers is proposed in this paper. Aim of the study is to build up a database on relevant concepts, cleaning of retrieved data and automate the process of information retrieval in the local database. For this purpose, a concept based ‘subject profiles’ in the area of advanced semiconductors as well as the rules for text extraction from metadata retrieved from the bibliographic database was developed. This subset was used as an input to the knowledge domain to support Ramp;D in the area of ‘advanced semiconductor materials and devices’ and provide information services on Intranet. Study found that concept based pattern matching on the datasets downloaded yielded better results as compared to the results by using the controlled vocabulary of the source database .
机译:“大数据”的紧急概念已经将范例从信息检索到信息提取技术转移。信息提取技术使语料库分析能够绘制有用的解释及其可能的应用。选择适当的信息提取技术取决于正在处理的数据类型及其可能的应用。在斜坡; D环境中,公布的信息被认为是学习和分析该科学领域,医学,业务的生长模式的经过认证的基准。基于规则的信息提取过程,从发布斜坡的书目数据库中提取的所选数据;在本文中提出了纸张。该研究的目的是在相关概念上建立一个数据库,清理检索到的数据并自动化本地数据库中的信息检索过程。为此目的,开发了一种基于高级半导体区域的基于概念的“主题配置文件”以及从书目数据库检索的元数据中的文本提取规则。该子集被用作知识域的输入,以支持斜坡; D在“高级半导体材料和设备”区域中,并在内部网上提供信息服务。研究发现,与使用源数据库的受控词汇相比,数据集上下载的基于数据集的基于概念的模式匹配会产生更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号