摘要:
关于提高文献的检索效率,在科技文献检索过程中,传统的基于关键词匹配的检索方法缺乏对知识的理解和处理,只能检索出包含关键词的文献,而不能检索出与关键词语义相似的文献,因而检索结果在查全率和查准率都无法满足检索者的要求.将模糊粗糙集理论引入信息检索当中,对信息检索模型的缺陷进行了改进.首先用传统的互信息函数计算标引词之间的语义关联权重,构建出模糊近似空间;然后用TF - IDF方法获得文档的模糊向量表示,在计算标引词重要度权重时,不但考虑了标引词出现的频度,还考虑位置因素,查询的模糊向量表示完全由用户的兴趣确定;最后用模糊近似空间对关键词进行概念扩展,挖掘出相似概念类,计算文档和查询模糊表示的上、下近似集,文档和查询的匹配不再是关键词匹配,而是利用布尔逻辑的合取、析取公式对上、下近似集进行模糊匹配,并返回按相似度值排序的检索结果.仿真测试表明,方法能提高科技文档检索的性能,能对科技文献进行概念意义上的检索.%In the process of technology document retrieval, the traditional information retrieval method based on key words match is lack of well understanding and handling the knowledge, which can only retrieve those documents containing the keywords, and can not retrieve the documents having semantic similarity with keywords, so the precision and recall of search results can not meet the users' needs. Because fuzzy rough sets theory has the advantage of processing the uncertainty knowledges, it is introduced in this paper to improve the conventional retrieval model.Firstly, the Mutual Information function was used to compute the semantic association weight among document index terms, and then the fuzzy approximation space was constructed. Secondly, the document fuzzy vector was obtained by TF-IDF method, when computed the index term weight, not only considered the occurrence frequency of index terms, but also considered its location in the document, and the query fuzzy vector was entirely determined by the user's interest. Finally, fuzzy approximation space was used to implement conceptual expansion for key words andmine the similar concept class. On this basis, the upper and lower approximation set of document fuzzy representation and query fuzzy representation were calculated. Documents and user's query were no longer keyword match, but as a conjunction and disjunction formula of Boolean logic to implement fuzzy match between upper and lower approximation set, then the ordinal results were returned to user according to the semantic similarity value. The simulation ex-periment tests show that the fuzzy rough sets retrieval model can enhance technology documents retrieval performance, which can carry out concept retrieval for technology documents.