首页> 外文期刊>Information Processing & Management >Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus
【24h】

Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus

机译:挖掘人类生成文本的母体:使用维基百科语料库的研究的系统综述

获取原文
获取原文并翻译 | 示例
       

摘要

Although primarily an encyclopedia, Wikipedia's expansive content provides a knowledge base that has been continuously exploited by researchers in a wide variety of domains. This article systematically reviews the scholarly studies that have used Wikipedia as a data source, and investigates the means by which Wikipedia has been employed in three main computer science research areas: information retrieval, natural language processing, and ontology building. We report and discuss the research trends of the identified and examined studies. We further identify and classify a list of tools that can be used to extract data from Wikipedia, and compile a list of currently available data sets extracted from Wikipedia.
机译:维基百科虽然主要是百科全书,但其广泛的内容提供了一个知识库,该知识库已被研究人员在各个领域中不断地利用。本文系统地回顾了使用Wikipedia作为数据源的学术研究,并研究了Wikipedia在三个主要计算机科学研究领域中所采用的方法:信息检索,自然语言处理和本体构建。我们报告并讨论已确定和审查的研究的研究趋势。我们进一步确定并分类了可用于从Wikipedia提取数据的工具列表,并汇编了从Wikipedia提取的当前可用数据集的列表。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号