...
首页> 外文期刊>Quality Control, Transactions >Retrieval of Scientific Documents Based on HFS and BERT
【24h】

Retrieval of Scientific Documents Based on HFS and BERT

机译:基于HFS和BERT的科学文档检索

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

When retrieving scientific documents with mathematical expressions as the main content, both mathematical expressions and their contextual text features require consideration. However, mathematical expressions are different from texts in terms of grammar and semantics. Thus, integrating the above features and realizing scientific document retrieval is difficult. In this study, a retrieval method of scientific documents based on HFS (Hesitation Fuzzy Sets) and BERT (Bidirectional Encoder Representations from Transformer) is proposed. This method is realized through utilizing the advantages of HFS in multi-attribute decision making and BERT in context-dependent similarity calculation. By analyzing mathematical expressions and calculating the membership degree of symbolic multi-attributes, the similarity of mathematical expressions can be obtained, which can improve the accuracy of mathematical expression recall. With the extraction of the text of the expression context, BERT is used to calculate the context similarity. Then, the recalled technical documents are sorted according to the similarity of context, and the final retrieval result can be obtained. Experiments were carried out on 10,372 Chinese and 11,770 English scientific documents in the NTCIR extended data set. The average value of MAP_ $k (k=10)$ for the recall results of scientific documents was 74.13%. The average $n$ DCG ( $n=10$ ) for the ranking of scientific documents was 86.04%.
机译:在用数学表达式检索科学文档作为主要内容时,两者都需要考虑数学表达式及其上下文文本特征。然而,数学表达式与语法和语义方面的文本不同。因此,整合上述特征并实现科学文档检索是困难的。在本研究中,提出了一种基于HFS(犹豫模糊集)和BERT(来自变压器的双向编码器表示)的科学文档的检索方法。通过利用HFS在多属性决策中的优点和基于上下文相关的相似性计算中的伯格来实现该方法。通过分析数学表达式并计算符号多属性的隶属度,可以获得数学表达式的相似性,这可以提高数学表达式召回的准确性。随着表达式上下文的文本的提取,伯特用于计算上下文相似度。然后,根据上下文的相似性对召回的技术文档进行排序,并且可以获得最终的检索结果。在NTCIR扩展数据集中对10,372名中英文科学文件进行了实验。 Map_ <内联 - 公式XMLNS的平均值:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> $ k(k = 10)$ ,科学文件的召回结果为74.13%。平均<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> $ n $ dcg(<内联公式xmlns:mml =“http://www.w3.org/1998/math/mathml”xmlns :xlink =“http://www.w3.org/1999/xlink”> $ n = 10 $ )进行排名科学文件为86.04%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号