首页> 外文OA文献 >Automatic Topic Extraction from Research Articles Using N-gram Analysis
【2h】

Automatic Topic Extraction from Research Articles Using N-gram Analysis

机译:使用N-gram分析从研究论文中自动提取主题

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Identifying the topic of an article can involve a lot of manual work. The manual processes canbe exhaustive when it comes to a large volume of articles. In order to tackle this problem, wepropose an automated topic extraction approach, which is able to extract topics for a largenumber of articles with a consideration to efficiency. To support the automatic topicextraction, our research focuses on existing N-gram analysis, which only calculates the wordsappearing frequency in a document. But in our research, we apply our customized filteringstandards to improve the efficiency. And also to eliminate the irrelevant or noncritical phrasesas many as possible. By doing that, we can make sure that our final selected keyphrases toeach article are unique labels, which can represent the core idea of each specific article. In ourcase, we choose to focus on the research papers within the autonomous vehicle domainbecause the research papers are highly demanded in our daily life. Since most of the researchpapers are available only in PDF format, we need to process the PDF format files into theeditable file types such as TXT. In order to realize the automation, we have selected a largenumber of autonomous vehicle-related articles to test our proposed idea. Then we observe theresult and compare it with the manual topic extraction result to evaluate our approach.
机译:确定文章的主题可能需要大量的人工工作。涉及大量物品时,手动过程可能会很详尽。为了解决这个问题,我们提出了一种自动主题提取方法,该方法能够在考虑效率的情况下提取大量文章的主题。为了支持自动主题提取,我们的研究集中在现有的N-gram分析上,该分析仅计算单词在文档中出现的频率。但是在我们的研究中,我们应用了定制的过滤标准来提高效率。并且还要尽可能多地消除不相关或不重要的短语。这样,我们可以确保最终选择的每篇文章的关键词都是唯一的标签,可以代表每篇特定文章的核心思想。在我们的案例中,我们选择专注于自动驾驶汽车领域内的研究论文,因为在日常生活中对研究论文的要求很高。由于大多数研究论文仅提供PDF格式,因此我们需要将PDF格式的文件处理为可编辑的文件类型,例如TXT。为了实现自动化,我们选择了大量与自动驾驶汽车相关的文章来测试我们提出的想法。然后,我们观察结果并将其与手动主题提取结果进行比较,以评估我们的方法。

著录项

  • 作者

    Chen Maomao; Huang Maoyi;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号