首页> 外文会议>International conference on recent advances in natural language processing >A Graph-based Text Similarity Measure That Employs Named Entity Information

【24h】

A Graph-based Text Similarity Measure That Employs Named Entity Information

机译：利用命名实体信息的基于图的文本相似性度量

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text comparison is an interesting though hard task, with many applications in Natural Language Processing. This work introduces a new text-similarity measure, which employs named-entities' information extracted from the texts and the n-gram graphs' model for representing documents. Using OpenCalais as a namedentity recognition service and the JIN-SECT toolkit for constructing and managing n-gram graphs, the text similarity measure is embedded in a text clustering algorithm (k-Means). The evaluation of the produced clusters with various clustering validity metrics shows that the extraction of named entities at a first step can be profitable for the time-performance of similarity measures that are based on the n-gram graph representation without affecting the overall performance of the NLP task.

机译：文本比较是一项有趣但艰巨的任务，在自然语言处理中有许多应用。这项工作引入了一种新的文本相似性度量，该度量采用了从文本中提取的命名实体信息和n-gram图模型来表示文档。使用OpenCalais作为命名实体识别服务以及用于构造和管理n元语法图的JIN-SECT工具包，将文本相似性度量嵌入文本聚类算法（k-Means）中。对产生的具有各种聚类有效性指标的聚类的评估表明，第一步提取的命名实体对于基于n-gram图表示的相似性度量的时间性能是有利可图的，而不会影响模型的整体性能。 NLP任务。

著录项

来源
《International conference on recent advances in natural language processing 》|2017年|765-771|共7页
会议地点
作者
Leonidas Tsekouras; Iraklis Varlamis; George Giannakopoulos;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. From senses to texts: An all-in-one graph-based approach for measuring semantic similarity [J] . Mohammad Taher Pilehvar, Roberto Navigli Artificial intelligence . 2015 ,第nova期

机译：从感官到文本：基于多图的语义相似度测量方法
2. Measuring Entity Relatedness via Entity and Text Joint Embedding [J] . Zeng Weixin, Tang Jiuyang, Zhao Xiang Neural processing letters . 2019 ,第2期

机译：通过实体和文本联合嵌入测量实体相关性
3. Sonic hedgehog-expressing cells in the developing limb measure time by an intrinsic cell cycle clock [J] . Kavitha Chinnaiya, Cheryll Tickle, Matthew Towers Nature Communications . 2014 ,第2016期

机译：发育肢体中表达 Sonic Hedgehog 的细胞通过内在的细胞周期时钟来测量时间
4. A Graph-based Text Similarity Measure That Employs Named Entity Information [C] . Leonidas Tsekouras, Iraklis Varlamis, George Giannakopoulos International conference on recent advances in natural language processing . 2017

机译：采用基于图形的文本相似度，该措施被命名为实体信息
5. Measuring named entity similarity through Wikipedia category hierarchies [D] . Ashman, Jared M. 2010

机译：通过Wikipedia类别层次结构测量命名实体的相似性
6. De-identifying Spanish medical texts - named entity recognition applied to radiology reports [O] . Irene Pérez-Díez, Raúl Pérez-Moraga, Adolfo López-Cerdán, 2021

机译：去识别西班牙医学文本 - 命名实体识别适用于放射学报告
7. Robust similarity measures for named entities matching [O] . Erwan Moreau, François Yvon, Olivier Cappé 2008

机译：命名实体匹配的强大相似度量

A Graph-based Text Similarity Measure That Employs Named Entity Information

摘要

著录项

相似文献

相关主题

期刊订阅