首页> 外文期刊>Knowledge-Based Systems >Mining large samples of web-based corpora
【24h】

Mining large samples of web-based corpora

机译:挖掘基于Web的语料库的大量样本

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a method to automatically mirror, process, and compare large samples of text corpora from Web-based information systems. The wealth of textual information contained in publicly available Web sites is converted into aggregated representations through textual analysis. The application of word lists, keyword analysis, term clustering, and correspondence analyses to identify and represent semantic relationships, including their longitudinal patterns, is illustrated through a case study that investigates the global coverage of solar power technologies in international media. The resulting graphs, indicators and tables describe complex relationships and developments that are hard to capture in traditional ways. As such they facilitate investigations about the nature and dynamics of Web content.
机译:本文提出了一种自动镜像,处理和比较基于Web的信息系统中的大型文本语料库样本的方法。公开的网站中包含的大量文本信息通过文本分析转换为汇总表示。通过一个案例研究,说明了单词表,关键字分析,术语聚类和对应分析用于识别和表示语义关系(包括其纵向模式)的应用,该案例研究了国际媒体对太阳能技术的全球报道。生成的图形,指标和表格描述了难以用传统方式捕获的复杂关系和发展。因此,它们有助于调查Web内容的性质和动态。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号