首页> 外文期刊>Journal of the American Society for Information Science >Using the Wayback Machine to Mine Websites in the Social Sciences: A Methodological Resource
【24h】

Using the Wayback Machine to Mine Websites in the Social Sciences: A Methodological Resource

机译:使用Wayback机器挖掘社会科学中的网站:一种方法论资源

获取原文
获取原文并翻译 | 示例
           

摘要

Websites offer an unobtrusive data source for developing and analyzing information about various types of social science phenomena. In this paper, we provide a methodological resource for social scientists looking to expand their toolkit using unstructured web-based text, and in particular, with the Wayback Machine, to access historical website data. After providing a literature review of existing research that uses the Wayback Machine, we put forward a step-by-step description of how the analyst can design a research project using archived websites. We draw on the example of a project that analyzes indicators of innovation activities and strategies in 300 U.S. small- and medium-sized enterprises in green goods industries. We present six steps to access historical Wayback website data: (a) sampling, (b) organizing and defining the boundaries of the web crawl, (c) crawling, (d) website variable operationaliza-tion, (e) integration with other data sources, and (f) analysis. Although our examples draw on specific types of firms in green goods industries, the method can be generalized to other areas of research. In discussing the limitations and benefits of using the Wayback Machine, we note that both machine and human effort are essential to developing a high-quality data set from archived web information.
机译:网站为开发和分析有关各种类型的社会科学现象的信息提供了一个引人入胜的数据源。在本文中,我们为社会科学家提供了一种方法资源,以寻求使用非结构化的基于网络的文本(尤其是使用Wayback Machine)来扩展其工具包来访问历史网站数据。在提供有关使用Wayback Machine的现有研究的文献综述之后,我们就分析师如何使用存档的网站设计研究项目的步骤进行了逐步描述。我们以一个项目为例,该项目分析了300家美国绿色商品行业中小型企业的创新活动和战略指标。我们提供访问历史Wayback网站数据的六个步骤:(a)采样,(b)组织和定义网络爬网的边界,(c)爬网,(d)网站变量可操作性,(e)与其他数据集成来源,以及(f)分析。尽管我们的示例借鉴了绿色商品行业中特定类型的公司,但该方法可以推广到其他研究领域。在讨论使用Wayback Machine的局限性和好处时,我们注意到,从存档的Web信息中开发高质量数据集,机器和人工都至关重要。

著录项

  • 来源
  • 作者单位

    School of Public Policy, Georgia Institute of Technology, Atlanta, GA 30332-0345;

    School of Public Policy, Georgia Institute of Technology, Atlanta, GA 30332-0345;

    Enterprise Innovation Institute, Georgia Institute of Technology, Atlanta, GA 30308;

    Manchester Institute of Innovation Research, Manchester Business School, University of Manchester, Manchester, M13 9PL, UK;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号