首页> 外文会议>CIPS-SIGHAN joint conference on Chinese language processing >A Study on Personal Attributes Extraction Based on the Combination of Sentences Classifications and Rules
【24h】

A Study on Personal Attributes Extraction Based on the Combination of Sentences Classifications and Rules

机译:句子分类与规则相结合的人格属性提取研究

获取原文

摘要

Personal attributes extraction plays a significant role in information mining, event tracing and personal name disambiguation. It mainly involves two problems, attribute recognition and decision making on whether this attribute belongs to the extracted person. Personal attributes generally involve named entities, which are recognized mainly by adjusting word segmentation software. As for those which cannot be recognized by word segmentation, the combination of feature words and rules can be used for their recognition. The combination of sentences classifications and rules is employed for attribute ownership decision. At first, all the sentences in the document are classified into those with attribute words and those without, with the latter omitted. The former are then classified into description sentences with one person and description sentences with more persons, according to the criterion that whether there are more than one person described in the sentence. According to statistics of description sentences with one person, anaphora resolution is not necessary, which reduces recognition errors from anaphora resolution failures. Minimum slicing is used for description sentences with more persons, and attribute ownership decision is made within the minimum language segment with the co-occurrence of both the person and the attribute. This method achieves 0.507388780 and 0.489505010 respectively in the lenient evaluation results and the strict evaluation results of SF_Value in CIPS-SIGHAN2014 Bakeoff, which turns out to be the best. The fact has shown that the method is effective.
机译:个人属性提取在信息挖掘,事件跟踪和个人名称消除歧义中起着重要作用。它主要涉及两个问题,即属性识别和关于该属性是否属于被提取者的决策。个人属性通常涉及命名实体,这些实体主要通过调整分词软件来识别。对于无法通过分词识别的特征,可以使用特征词和规则的组合进行识别。句子分类和规则的组合用于属性所有权决定。首先,将文档中的所有句子分为具有属性词的句子和没有属性词的句子,后者省略。然后根据句子中描述的人是否多于一个标准,将前者分为一个人的描述语句和一个人的描述语句。根据一个人的描述语句的统计,回指解析不是必需的,这减少了回指解析失败引起的识别错误。最小切片用于具有更多人的描述语句,并且在人与属性同时出现的情况下,在最小语言段内做出属性所有权决定。该方法在CIPS-SIGHAN2014 Bakeoff中的宽松评价结果和SF_Value的严格评价结果分别达到0.507388780和0.489505010,结果是最好的。事实表明该方法是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号