...
首页> 外文期刊>Social science computer review >Trails of Data: Three Cases for Collecting Web Information for Social Science Research
【24h】

Trails of Data: Three Cases for Collecting Web Information for Social Science Research

机译:数据迹纵横:三个案例用于收集社会科学研究的网络信息

获取原文
获取原文并翻译 | 示例

摘要

As the availability of online data grows rapidly, researchers are confronted with a pressing question: How should social scientists collect Internet data for research? This study focuses on one of the most commonly used data collection techniques: web scraping. Going beyond canned approaches by leveraging a general framework of data communication, this study illustrates how online information can be systematically queried and fetched for reproducible research. To generalize our approaches, we additionally explore the variations in site security and architecture that analysts may encounter during the scraping process before they are given access to the desired data. The approaches we introduce do not rely on any proprietary software and can be easily implemented on any computing platform with programming languages such as Python or R. The methodological discussion in this study is meant to be applicable to current web-based research efforts. We include three examples with complete Python implementation. We also present an integrated workflow that enables researchers to produce analytical data sets that are traceable and thus verifiable for analysis or replication. Lastly, options related to the validity and efficiency of data are discussed, and we highlight the ongoing debate surrounding the ethics of online data collection, ultimately advocating for the fair use of online data.
机译:随着在线数据的可用性迅速增长,研究人员面临着一个紧迫的问题:社会科学家应该如何收集研究互联网数据?本研究重点介绍了最常用的数据收集技术之一:Web刮擦。本研究通过利用一般的数据通信框架来超越罐头方法,说明了如何系统地查询和获取在线信息以进行可再现的研究。要概括我们的方法,我们还探讨了分析师在擦除过程中可能遇到的站点安全性和架构的变化,然后才能获得所需数据。我们介绍的方法不依赖于任何专有软件,并且可以在任何计算平台上轻松实现,其中编程语言如Python或R.本研究中的方法论讨论是适用于当前的基于网络的研究工作。我们包含完整的Python实现的三个例子。我们还提供了一个集成的工作流,使研究人员能够生产可追溯的分析数据集,从而可验证的分析或复制。最后,讨论了与数据有效性和效率相关的选项,我们突出了在线数据收集的道德规范的持续辩论,最终倡导公平使用在线数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号