...
首页> 外文期刊>IEICE transactions on information and systems >Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts
【24h】

Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts

机译:从泰语文本中提取命名实体之间的谓词导向关系的发现

获取原文

摘要

Extracting named entities (NEs) and their relations is more difficult in Thai than in other languages due to several Thai specific characteristics, including no explicit boundaries for words, phrases and sentences; few case markers and modifier clues; high ambiguity in compound words and serial verbs; and flexible word orders. Unlike most previous works which focused on NE relations of specific actions, such as work_for , live_in , located_in , and kill , this paper proposes more general types of NE relations, called predicate-oriented relation (PoR), where an extracted action part (verb) is used as a core component to associate related named entities extracted from Thai Texts. Lacking a practical parser for the Thai language, we present three types of surface features, i.e. punctuation marks (such as token spaces), entity types and the number of entities and then apply five alternative commonly used learning schemes to investigate their performance on predicate-oriented relation extraction. The experimental results show that our approach achieves the F-measure of 97.76%, 99.19%, 95.00% and 93.50% on four different types of predicate-oriented relation (action-location, location-action, action-person and person-action) in crime-related news documents using a data set of 1,736 entity pairs. The effects of NE extraction techniques, feature sets and class unbalance on the performance of relation extraction are explored.
机译:由于泰语具有若干特定的特征,包括单词,词组和句子没有明确的边界,因此在泰语中提取命名实体及其关系比在其他语言中要困难得多。几个案例标记和修饰符线索;复合词和系列动词的歧义性很高;灵活的单词顺序。与以往大多数专注于特定动作的网元关系(例如 work_for, live_in, located_in和 kill)的工作不同,本文提出了更通用的网元关系类型,称为面向谓词的关系(PoR),其中提取的动作部分(动词)用作核心组件,以关联从泰语文本中提取的相关命名实体。缺乏实用的泰语解析器,我们提供了三种类型的表面特征,即标点符号(例如标记空间),实体类型和实体数量,然后应用了五种替代的常用学习方案来研究它们在谓词上的性能-定向关系提取。实验结果表明,我们的方法在四种面向谓词的关系类型(动作-位置,位置-动作,动作-人和人-动作)上实现了97.76%,99.19%,95.00%和93.50%的F-度量。使用1736个实体对的数据集处理与犯罪相关的新闻文件。探讨了网元提取技术,特征集和类不平衡对关系提取性能的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号