New Information Content Glossary Relatedness (ICGR) Approach for Short Text Similarity (STS) Tasks

BenOmran Ali Muftah; Ab Aziz Mohd Juzaiddin

首页> 外文期刊>Journal of computer sciences >New Information Content Glossary Relatedness (ICGR) Approach for Short Text Similarity (STS) Tasks

【24h】

New Information Content Glossary Relatedness (ICGR) Approach for Short Text Similarity (STS) Tasks

机译：短文本相似性（STS）任务的新信息内容词汇相关性（ICGR）方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The measurement of the relatedness of word semantics based on complementary Wikipedia and WordNet-based methods takes two forms, combined and integrative, which are aimed at increasing the semantic space between related words. However, each form has its own set of issues regarding its components and the strategy that is used to combine and integrate corpus-based and knowledge-based methods. In the integrative strategy, a large corpus, such as Wikipedia, is used to extract a set of related words for a particular concept as a basis for searching the WordNet space. The drawback to this strategy is in its use of a fixed scaling parameter, which only fits an implemented dataset that is near to a human score. Other corpus-based methods use a cut-off threshold that is determined experimentally to reduce the semantic space and to increase the search for a more accurate semantic space. Such methods merely take into account the frequency of bigrams, while ignoring the frequency of individual terms. Knowledge-based methods using a gloss overlap have a similar limitation to the corpus-based methods, where they lead to the loss of many valuable relatedness features that determine a more accurate measurement. Thus, in this paper, a new Information Content Glossary Relatedness (ICGR) approach was proposed in two steps, namely, an Extended-PMI based on a cut-off density threshold was proposed to extract a Robust Relatedness Vector set (RVS) of a large Wikipedia dataset. Then, a Semantic Structural Information (SSI) method was presented to use the RVS as a fulcrum to define the most relatedness gloss in the WordNet of each gloss and to select the top 5 glosses related to each RVS. The results showed that the proposed approach outperformed the state-of-the-art set, where the Extended-PMI achieved a Spearman’s correlation of 0.89 to the human score and the ICGR approach achieved a Spearman’s correlation of 0.8 to the human score.

机译：基于互补的维基百科和基于WordNet的方法对词语义的相关性进行度量有两种形式，即组合形式和集成形式，旨在增加相关词之间的语义空间。但是，每种形式在其组成部分和用于组合和集成基于语料库和基于知识的方法的策略方面都有自己的问题。在集成策略中，大型语料库（例如Wikipedia）用于提取特定概念的一组相关词，作为搜索WordNet空间的基础。该策略的缺点是使用固定的缩放参数，该参数仅适合接近人类得分的已实现数据集。其他基于语料库的方法使用实验确定的截止阈值，以减少语义空间并增加对更准确语义空间的搜索。这样的方法仅考虑了二元组的频率，而忽略了单个项的频率。使用光泽重叠的基于知识的方法与基于语料库的方法具有类似的局限性，在这些方法中，它们导致许多有价值的相关性特征丢失，从而无法确定更准确的测量结果。因此，本文分两步提出了一种新的信息内容词汇相关度（ICGR）方法，即基于临界密度阈值的Extended-PMI提出了一种鲁棒相关度向量集（RVS）的提取方法。大型维基百科数据集。然后，提出了一种语义结构信息（SSI）方法，以将RVS用作支点来定义WordNet中每种光泽最相关的光泽，并选择与每个RVS相关的前5个光泽。结果表明，所提出的方法优于最新技术，其中Extended-PMI实现了Spearman与人类得分的相关性为0.89，而ICGR方法实现了Spearman与人类得分的相关性为0.8。

著录项

来源
《Journal of computer sciences》 |2019年第6期|共16页
作者
BenOmran Ali Muftah; Ab Aziz Mohd Juzaiddin;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
WikipediaPMIWordNetGlossStructural InformationSTS;

机译：WikipediaPMIWordNetGloss结构信息STS;

相似文献

外文文献
中文文献
专利

1. New Information Content Glossary Relatedness (ICGR) Approach for Short Text Similarity (STS) Tasks [J] . Ali Muftah BenOmran, Mohd Juzaiddin Ab Aziz Journal of computer sciences . 2019,第6期

机译：短文本相似性（STS）任务的新信息内容词汇相关性（ICGR）方法
2. A Machine Learning Approach for Automated Evaluation of Short Answers Using Text Similarity Based on WordNet Graphs [J] . Vij Sonakshi, Tayal Devendra, Jain Amita Wireless personal communications: An Internaional Journal . 2020,第2期

机译：基于Wordnet图的文本相似性自动评估的机器学习方法
3. Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text [J] . McInnesB.T., PedersenT. Journal of biomedical informatics. . 2013,第6期

机译：评价语义相似性和相关性的措施以消除生物医学文本中的歧义术语
4. DalGTM at SemEval-2016 Task 1: Importance-Aware Compositional Approach to Short Text Similarity [C] . Jie Mei, Aminul Islam, Evangelos Milios International workshop on semantic evaluation;Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies . 2016

机译：DalGTM在SemEval-2016上的任务1：实现短文本相似性的重要性感知组成方法
5. Short-Text Semantic Similarity: Algorithms and Applications. [D] . Sultan, Md Arafat. 2016

机译：短文本语义相似性：算法和应用。
6. Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text [O] . Bridget McInnes, Ted Pedersen -1

机译：评价语义相似性和相关性的措施以消除生物医学文本中的歧义术语
7. A Fuzzy Similarity Approach in Text Classification Task [O] . Dwi H. Widyantoro, John Yen 2008

机译：文本分类任务中的模糊相似度方法

New Information Content Glossary Relatedness (ICGR) Approach for Short Text Similarity (STS) Tasks

摘要

著录项

相似文献

相关主题

期刊订阅