首页> 外文会议>IEEE International Conference on Consumer Electronics - Taiwan >Research on Extracting Named Entities in Software Engineering Field from Wiki Webpage
【24h】

Research on Extracting Named Entities in Software Engineering Field from Wiki Webpage

机译:Wiki网页软件工程字段中提取名称实体的研究

获取原文

摘要

Extracting entity concepts from wiki pages is a common way of entity recognition. The common methods for named entity recognition are based on Conditional Random Field (CRF) and rules, such as Harvesting Domain Specic Knowledge Graph from Content of Webpages (HDSKG). However, the features of entity concepts and term phrases in the field of software engineering are not fully considered in HDSKG. To solve the problem, we propose a more efficient algorithm. We first use the webpage title to construct the domain dictionary, and then design the regular rules according to the entity concept features in the software engineering field. Next, the domain dictionary is used to improve the NP chunks in the chunking process. The experimental results show that compared with HDSKG, the proposed algorithm has a significant improvement in the number of entities, accuracy, precision and recall rate.
机译:从Wiki页面中提取实体概念是一个常用的实体识别方式。命名实体识别的常用方法基于条件随机字段(CRF)和规则,例如从网页的内容收获域样本知识图表(HDSKG)。但是,在HDSKG中不完全考虑软件工程领域中的实体概念和术语短语的特征。为了解决问题,我们提出了更有效的算法。我们首先使用网页标题来构建域字典,然后根据软件工程字段中的实体概念功能设计常规规则。接下来,域字典用于改进块流程中的NP块。实验结果表明,与HDSKG相比,该算法的实体数量,准确性,精度和召回率的数量相比,该算法与HDSKG相比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号