首页> 外文会议>CIPS-SIGHAN joint conference on Chinese language processing >A Study on Personal Attributes Extraction Based on the Combination of Sentences Classifications and Rules
【24h】

A Study on Personal Attributes Extraction Based on the Combination of Sentences Classifications and Rules

机译:基于句子分类和规则组合的个人属性提取研究

获取原文

摘要

Personal attributes extraction plays a significant role in information mining, event tracing and personal name disambiguation. It mainly involves two problems, attribute recognition and decision making on whether this attribute belongs to the extracted person. Personal attributes generally involve named entities, which are recognized mainly by adjusting word segmentation software. As for those which cannot be recognized by word segmentation, the combination of feature words and rules can be used for their recognition. The combination of sentences classifications and rules is employed for attribute ownership decision. At first, all the sentences in the document are classified into those with attribute words and those without, with the latter omitted. The former are then classified into description sentences with one person and description sentences with more persons, according to the criterion that whether there are more than one person described in the sentence. According to statistics of description sentences with one person, anaphora resolution is not necessary, which reduces recognition errors from anaphora resolution failures. Minimum slicing is used for description sentences with more persons, and attribute ownership decision is made within the minimum language segment with the co-occurrence of both the person and the attribute. This method achieves 0.507388780 and 0.489505010 respectively in the lenient evaluation results and the strict evaluation results of SF_Value in CIPS-SIGHAN2014 Bakeoff, which turns out to be the best. The fact has shown that the method is effective.
机译:个人属性提取在信息挖掘,事件跟踪和个人名称歧义中起着重要作用。它主要涉及两个问题,属性识别和决策,以及该属性是否属于提取的人。个人属性通常涉及命名实体,这些实体主要通过调整字分段软件来识别。对于单词分割不能识别的那些,可以使用特征词和规则的组合来识别。句子分类和规则的组合用于属性所有权决策。首先,文档中的所有句子都被分类为具有属性单词的人和那些没有,后者省略了。然后将前者分类为与一个人和描述句子的描述句子,与更多人的句子,根据该句子中描述的人是否有多个人。根据描述与一个人的描述句子的统计数据,不需要申请者解决,这减少了来自Apaphora决议失败的识别错误。最小切片用于描述句子与更多人,并且属性所有权决定在最低语言段内,具有人员和属性的共同发生。该方法分别在宽松评估结果中实现了0.507388780和0.489505010,CIPS-Sighan2014 BAKEOFF的SF_VALUE的严格评估结果,结果是最好的。事实表明该方法是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号