首页> 外文会议>APWeb-WAIM Data Science Workshop >A Web-Based Theme-Related Word Set Construction Algorithm
【24h】

A Web-Based Theme-Related Word Set Construction Algorithm

机译:基于Web的主题相关词组施工算法

获取原文
获取外文期刊封面目录资料

摘要

Constructing theme-related word set is a basic work for establishing theme-oriented information retrieval systems. Nowadays, most of previous studies focus on identifying representative words of a specific document, and few studies pay attention to constructing a word set related to a theme. By analyzing existing keywords extraction methods, this paper proposes a method to automatically construct theme-related word set based on the primary theme-related word set given by domain experts and the well-known websites related to the theme. As the first step, the method uses existing information extraction techniques to obtain the documents from the websites and every document's keyword set. Then it calculates the correlation degree between the known theme-related word set and the document keyword set, further gets a word set of the document related to the theme based on the document-theme relevance, and merges the word set to the theme-related word set. By using the method, the theme-related word set is supplemented by iteration based on the documents gotten from the theme-related websites. Because there is little research work focusing on this problem and no relevant experimental data set, this paper uses the proposed method to construct theme-related word sets towards two themes "electricity" and "college entrance examination", and we invite domain experts to evaluate the word sets. The results show that a relatively complete theme-related word set can be obtained based on this method, which shows the feasibility of our methods.
机译:构建主题相关的单词集是建立面向主题的信息检索系统的基本工作。如今,以前的大多数研究侧重于识别特定文件的代表词,很少有研究会注意构建与主题相关的单词集。通过分析现有的关键字提取方法,本文提出了一种基于由域专家给出的主要主题相关词组和与主题相关的知名网站来构建与主题相关词集的方法。作为第一步,该方法使用现有的信息提取技术从网站和每个文档的关键字集中获取文档。然后,它计算已知主题相关的单词集和文档关键字集之间的相关程度,进一步获取与基于文档主题相关性与主题相关的文档的单词集,并将单词设置为与主题相关联单词集。通过使用该方法,基于与主题相关网站所获得的文档,通过迭代补充主题相关字集。因为没有研究工作的重点是这个问题而没有相关的实验数据集,所以本文使用了建议的方法来构建与“电力”和“高考”和“高考”的主题相关词组,我们邀请域专家评估这个词集。结果表明,可以基于该方法获得相对完整的主题相关词组,其显示了我们方法的可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号