首页> 外文会议>International conference on recent advances in natural language processing >A Graph-based Text Similarity Measure That Employs Named Entity Information
【24h】

A Graph-based Text Similarity Measure That Employs Named Entity Information

机译:利用命名实体信息的基于图的文本相似性度量

获取原文

摘要

Text comparison is an interesting though hard task, with many applications in Natural Language Processing. This work introduces a new text-similarity measure, which employs named-entities' information extracted from the texts and the n-gram graphs' model for representing documents. Using OpenCalais as a namedentity recognition service and the JIN-SECT toolkit for constructing and managing n-gram graphs, the text similarity measure is embedded in a text clustering algorithm (k-Means). The evaluation of the produced clusters with various clustering validity metrics shows that the extraction of named entities at a first step can be profitable for the time-performance of similarity measures that are based on the n-gram graph representation without affecting the overall performance of the NLP task.
机译:文本比较是一项有趣但艰巨的任务,在自然语言处理中有许多应用。这项工作引入了一种新的文本相似性度量,该度量采用了从文本中提取的命名实体信息和n-gram图模型来表示文档。使用OpenCalais作为命名实体识别服务以及用于构造和管理n元语法图的JIN-SECT工具包,将文本相似性度量嵌入文本聚类算法(k-Means)中。对产生的具有各种聚类有效性指标的聚类的评估表明,第一步提取的命名实体对于基于n-gram图表示的相似性度量的时间性能是有利可图的,而不会影响模型的整体性能。 NLP任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号