首页> 外文会议>International Conference on Advanced Informatics: Concept Theory and Applications >Interpretable Semantic Textual Similarity for Indonesian Sentence
【24h】

Interpretable Semantic Textual Similarity for Indonesian Sentence

机译:印尼语句子的可解释语义文本相似性

获取原文

摘要

We develop iSTS (Interpretable Semantic Textual Similarity) model to Indonesian corpus. System of iSTS is not only to represent the STS (Semantic Textual Similarity) score but also to give an explanation of the semantic similarity of the pair of sentence. The term of explanation refers to a pair of chunks with type such as EQUI, OPPO, SPE1, SPE2, REL, SIMI, NOALI and score ranged 0 to 5. Nowadays, iSTS corpus has not existed in the Indonesian version yet, by that mean we build that corpus. We adapt two best iSTS techniques for English corpus: VRep and UWB. VRep uses WordNet to representing word semantic, while UWB uses word embedding. Both of the techniques use similar process, such as preprocess, feature extraction, and classification. The adaptation of VRep and UWB on this research is performed by changing English resources in Indonesia such as WordNet, word embedding, etc. We also use four classifier as well as decision tree, SVM, random forest, and multilayer perceptron. VRep becomes the best model on type aspect and score aspect, while UWB becomes the best model on type + score aspect.
机译:我们开发了印度尼西亚语料库的iSTS(可解释语义文本相似性)模型。 iSTS系统不仅可以表示STS(语义文本相似度)评分,而且可以解释这对句子的语义相似度。解释术语指的是一对类型为EQUI,OPPO,SPE1,SPE2,REL,SIMI,NOALI的块,得分范围为0到5。如今,iSTS语料库在印度尼西亚语版本中尚不存在。我们建立了语料库。我们将两种最佳的iSTS技术用于英语语料库:VRep和UWB。 VRep使用WordNet表示单词语义,而UWB使用单词嵌入。两种技术都使用相似的过程,例如预处理,特征提取和分类。通过更改印度尼西亚的WordNet,单词嵌入等英语资源来进行VRep和UWB的改编。我们还使用了四个分类器以及决策树,SVM,随机森林和多层感知器。 VRep成为类型方面和得分方面的最佳模型,而UWB成为类型+得分方面的最佳模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号