首页> 外文会议>Annual Meeting of the Association for Information Science and Technology >Towards Building a Collection of Web Archiving Research Articles
【24h】

Towards Building a Collection of Web Archiving Research Articles

机译:建立一系列网络归档研究文章

获取原文

摘要

The field of Web Archiving exists in a fluid, fragmented, and heterogeneous state. Part of the problem is that this field is relatively new and its literature is scattered across a wide range of journal and conference venues. This makes the state of Web Archiving as a discipline particularly difficult to ascertain. This paper presents an approach to building a collection of articles about the subject. We begin with a small dataset of articles taken from a Web Archiving Bibliography and then proceed to expand it by crawling the Web and collecting additional documents. The crawled documents are then classified using machine learning classification techniques. We show that by extracting the documents' titles and abstracts and representing them using the "bag of words" approach, we are able to accurately identify documents from the Web crawler as documents that are about Web Archiving. We also discuss our results in the context of Web Archiving as an emerging field.
机译:在流体,碎片化和异构状态下存在网络归档领域。 部分问题是,该领域比较新,其文学分散在广泛的日记和会议场地。 这使得Web归档状态作为特别难以确定的学科。 本文介绍了建立关于该主题的文章集合的方法。 我们从一个来自Web归档参考书目中获取的一小数据集,然后通过爬网并收集其他文件来进行展开。 然后使用机器学习分类技术进行逐渐进行爬行的文件。 我们展示了通过提取文件的标题和摘要并使用“单词”方法来表示,我们能够准确地识别Web爬虫的文档作为关于Web归档的文档。 我们还在Web Archiving作为新兴领域的背景下讨论我们的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号