【24h】

On the Feasibility of Automated Detection of Allusive Text Reuse

机译:自动检测暗示性文本重用的可行性

获取原文

摘要

The detection of allusive text reuse is particularly challenging due to the sparse evidence on which allusive references rely—commonly based on none or very few shared words. Arguably, lexical semantics can be resorted to since uncovering semantic relations between words has the potential to increase the support underlying the allusion and alleviate the lexical sparsity. A further obstacle is the lack of evaluation benchmark corpora, largely due to the highly interpretative character of the annotation process. In the present paper, we aim to elucidate the feasibility of automated allusion detection. We approach the matter from an Information Retrieval perspective in which referencing texts act as queries and referenced texts as relevant documents to be retrieved, and estimate the difficulty of benchmark corpus compilation by a novel inter-annotator agreement study on query segmentation. Furthermore, we investigate to what extent the integration of lexical semantic information derived from distributional models and ontologies can aid retrieving cases of allusive reuse. The results show that (i) despite low agreement scores, using manual queries considerably improves retrieval performance with respect to a windowing approach, and that (ii) retrieval performance can be moderately boosted with distributional semantics.
机译:由于典故引用所依赖的稀疏证据(通常基于无共享词或很少共享词),因此对典故文本重用的检测尤其具有挑战性。可以说,可以采用词汇语义学,因为揭示单词之间的语义关系有可能增加暗指背后的支持并减轻词汇稀疏性。另一个障碍是缺乏评估基准语料库,这主要是由于注释过程的高度解释性。在本文中,我们旨在阐明自动典故检测的可行性。我们从信息检索的角度处理此问题,在这种情况下,引用文本充当查询,引用文本充当要检索的相关文档,并通过关于查询细分的新型注释者间协议研究来估计基准语料库编译的难度。此外,我们研究了在多大程度上从分布模型和本体派生而来的词汇语义信息的整合可以帮助检索典故性重用的案例。结果表明:(i)尽管一致性得分较低,但相对于窗口方法,使用手动查询可以显着提高检索性能,并且(ii)分布式语义可以适当地提高检索性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号