...
首页> 外文期刊>Journal of Data and Information Science >CitationAS: A Tool of Automatic Survey Generation Based on Citation Content
【24h】

CitationAS: A Tool of Automatic Survey Generation Based on Citation Content

机译:CitationAS:一种基于引文内容的自动调查生成工具

获取原文
           

摘要

Purpose: This study aims to build an automatic survey generation tool, named CitationAS,based on citation content as represented by the set of citing sentences in the original articles.Design/methodology/approach: Firstly, we apply LDA to analyse topic distribution ofcitation content. Secondly, in CitationAS, we use bisecting K-means, Lingo and STC tocluster retrieved citation content. Then Word2Vec, WordNet and combination of them areapplied to generate cluster labels. Next, we employ TF-IDF, MMR, as well as consideringsentence location information, to extract important sentences, which are used to generatesurveys. Finally, we adopt manual evaluation for the generated surveys.Findings: In experiments, we choose 20 high-frequency phrases as search terms. Resultsshow that Lingo-Word2Vec, STC-WordNet and bisecting K-means-Word2Vec have betterclustering effects. In 5 points evaluation system, survey quality scores obtained by designingmethods are close to 3, indicating surveys are within acceptable limits. When consideringsentence location information, survey quality will be improved. Combination of Lingo,Word2Vec, TF-IDF or MMR can acquire higher survey quality.Research limitations: The manual evaluation method may have a certain subjectivity. We usea simple linear function to combine Word2Vec and WordNet that may not bring out theirstrengths. The generated surveys may not contain some newly created knowledge of somearticles which may concentrate on sentences with no citing.Practical implications: CitationAS tool can automatically generate a comprehensive,detailed and accurate survey according to user’s search terms. It can also help researcherslearn about research status in a certain field.
机译:目的:本研究旨在基于原始文章中一组引文句子所表示的引文内容,构建一个名为CitationAS的自动调查生成工具。设计/方法/方法:首先,我们使用LDA分析引文内容的主题分布。其次,在CitationAS中,我们使用等分的K均值,Lingo和STC聚类检索的引文内容。然后将Word2Vec,WordNet及其组合应用于生成群集标签。接下来,我们使用TF-IDF,MMR以及考虑句子位置信息来提取重要句子,这些句子用于生成调查。最后,我们对生成的调查采用人工评估。结果:在实验中,我们选择了20个高频短语作为搜索词。结果表明,Lingo-Word2Vec,STC-WordNet和平分K-means-Word2Vec具有更好的聚类效果。在5分评估系统中,通过设计方法获得的调查质量得分接近3,表明调查在可接受的范围内。在考虑句子位置信息时,将提高调查质量。 Lingo,Word2Vec,TF-IDF或MMR的组合可以获得更高的调查质量。研究局限性:人工评估方法可能具有一定的主观性。我们使用一个简单的线性函数来组合Word2Vec和WordNet,这可能无法发挥其优势。生成的调查可能不包含某些文章的一些新创建的知识,而这些知识可能集中于句子而没有引用。实际意义:CitationAS工具可以根据用户的搜索词自动生成全面,详细和准确的调查。它还可以帮助研究人员了解特定领域的研究状况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号