【24h】

On the Feasibility of Automated Detection of Allusive Text Reuse

机译:关于自动检测自动检测的可行性

获取原文

摘要

The detection of allusive text reuse is particularly challenging due to the sparse evidence on which allusive references rely—commonly based on none or very few shared words. Arguably, lexical semantics can be resorted to since uncovering semantic relations between words has the potential to increase the support underlying the allusion and alleviate the lexical sparsity. A further obstacle is the lack of evaluation benchmark corpora, largely due to the highly interpretative character of the annotation process. In the present paper, we aim to elucidate the feasibility of automated allusion detection. We approach the matter from an Information Retrieval perspective in which referencing texts act as queries and referenced texts as relevant documents to be retrieved, and estimate the difficulty of benchmark corpus compilation by a novel inter-annotator agreement study on query segmentation. Furthermore, we investigate to what extent the integration of lexical semantic information derived from distributional models and ontologies can aid retrieving cases of allusive reuse. The results show that (i) despite low agreement scores, using manual queries considerably improves retrieval performance with respect to a windowing approach, and that (ii) retrieval performance can be moderately boosted with distributional semantics.
机译:由于稀疏证据依赖于没有基于无分支的单词的稀疏证据,因此稀缺的证据尤其具有挑战性。可以说,可以采取词汇语义,因为揭示了单词之间的语义关系有可能增加暗示潜在的支持,并减轻词汇稀疏性。进一步的障碍是缺乏评估基准语料库,主要是由于注释过程的高度解释性。在本文中,我们的目的是阐明自动化典故检测的可行性。我们从信息检索透视中接近该事项,其中引用文本充当查询并引用文本作为要检索的相关文档,并通过新的Annotator协议研究估计查询分割的新型注释协议研究的基准语料库汇编。此外,我们调查来自分布模型和本体的词汇语义信息的集成在多大程度上可以帮助检索紫外线的情况。结果表明,(i)尽管协议得分低,但使用手动查询显着提高了关于窗口方法的检索性能,并且(ii)可以使用分布语义进行中度提升(ii)检索性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号