...
首页> 外文期刊>History and Computing >LOST IN THE INFINITE ARCHIVE: THE PROMISE AND PITFALLS OF WEB ARCHIVES
【24h】

LOST IN THE INFINITE ARCHIVE: THE PROMISE AND PITFALLS OF WEB ARCHIVES

机译:无限归档中的损失:Web归档的承诺和不足

获取原文
获取原文并翻译 | 示例
           

摘要

Contemporary and future historians need to grapple with and confront the challenges posed by web archives. These large collections of material, accessed either through the Internet Archive's Wayback Machine or through other computational methods, represent both a challenge and an opportunity to historians. Through these collections, we have the potential to access the voices of millions of non-elite individuals (recognizing of course the cleavages in both Web access as well as method of access). To put this in perspective, the Old Bailey Online currently describes its monumental holdings of 197,745 trials between 1674 and 1913 as the "largest body of texts detailing the lives of non-elite people ever published." GeoCities.com, a platform for everyday web publishing in the mid-to-late 1990s and early 2000s, amounted to over thirty-eight million individual webpages. Historians will have access, in some form, to millions of pages: written by everyday people of various classes, genders, ethnicities, and ages. While the Web was not a perfect democracy by any means - it was and is unevenly accessed across each of those categories - this still represents a massive collection of non-elite speech. Yet a figure like thirty-eight million webpages is both a blessing and a curse. We cannot read every website, and must instead rely upon discovery tools to find the information that we need. Yet these tools largely do not exist for web archives, or are in a very early state of development: what will they look like? What information do historians want to access? We cannot simply map over web tools optimized for discovering current information through online searches or metadata analysis. We need to find information that mattered at the time, to diverse and very large communities. Furthermore, web pages cannot be viewed in isolation, outside of the networks that they inhabited. In theory, amongst corpuses of millions of pages, researchers can find whatever they want to confirm. The trick is situating it into a larger social and cultural context: is it representative ? Unique ? In this paper, "Lost in the Infinite Archive," I explore what the future of digital methods for historians will be when they need to explore web archives. Historical research of periods beginning in the mid-1990s will need to use web archives, and right now we are not ready. This article draws on first-hand research with the Internet Archive and Archive-It web archiving teams. It draws upon three exhaustive datasets: the large Web ARChive (WARC) files that make up Wide Web Scrapes of the Web; the metadata-intensive WAT files that provide networked contextual information; and the lifted-straight-from-the-web guerilla archives generated by groups like Archive Team. Through these case studies, we can see-hands-on-what richness and potentials lie in these new cultural records, and what approaches we may need to adopt. It helps underscore the need to have humanists involved at this early, crucial stage.
机译:当代和未来的历史学家都需要应对和应对网​​络档案馆带来的挑战。通过Internet档案馆的Wayback Machine或通过其他计算方法访问的大量材料,对历史学家而言既是挑战,也是机遇。通过这些馆藏,我们有可能获得数百万非精英人士的声音(当然可以肯定Web访问和访问方法中的分歧)。为了正确理解这一点,《老贝利在线》目前将其在1674年至1913年间拥有197,745项审判的巨著描述为“详细描述有史以来非精英人士生活的最大文献”。 GeoCities.com是1990年代中期至2000年代初的日常Web发布平台,其个人网页总数超过三千八百万。历史学家将以某种形式访问数以百万计的页面:由各种阶级,性别,种族和年龄的日常人们撰写。尽管网络在任何方面都不是一个完美的民主国家,但在每个类别中访问它的方式都是不均衡的,但仍然代表着大量的非精英言论。然而,像三千八百万个网页这样的数字既是福也是祸。我们无法阅读每个网站,而必须依靠发现工具来找到我们所需的信息。但是,这些工具在很大程度上不存在于Web存档中,或者处于非常早期的开发状态:它们的外观如何?历史学家想获取哪些信息?我们不能简单地映射为通过在线搜索或元数据分析发现当前信息而优化的网络工具。我们需要找到对不同的大型社区当时重要的信息。此外,网页无法在其所居住的网络之外单独查看。从理论上讲,在数百万个页面的语料库中,研究人员可以找到想要确认的任何内容。诀窍在于将其置于更大的社会和文化环境中:它具有代表性吗?独特 ?在本文“丢失在无限档案中”中,我探讨了当历史学家需要探索网络档案时数字方法的未来。从1990年代中期开始的时期的历史研究将需要使用网络档案,而现在我们还没有准备好。本文借鉴了Internet Archive和Archive-It Web归档团队的第一手研究。它利用了三个详尽的数据集:组成Web的Web Scrapes的大型Web ARChive(WARC)文件;以及提供网络上下文信息的元数据密集型WAT文件;以及由存档团队等组织生成的直接从网络中删除的游击存档。通过这些案例研究,我们可以亲眼看到这些新的文化记录中蕴含的丰富性和潜力,以及我们可能需要采用的方法。它有助于强调在这个关键的早期阶段必须让人道主义者参与其中的必要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号