Extracting entity concepts from wiki pages is a common way of entity recognition. The common methods for named entity recognition are based on Conditional Random Field (CRF) and rules, such as Harvesting Domain Specic Knowledge Graph from Content of Webpages (HDSKG). However, the features of entity concepts and term phrases in the field of software engineering are not fully considered in HDSKG. To solve the problem, we propose a more efficient algorithm. We first use the webpage title to construct the domain dictionary, and then design the regular rules according to the entity concept features in the software engineering field. Next, the domain dictionary is used to improve the NP chunks in the chunking process. The experimental results show that compared with HDSKG, the proposed algorithm has a significant improvement in the number of entities, accuracy, precision and recall rate.
展开▼