首页> 美国卫生研究院文献>other >Human Rights Texts: Converting Human Rights Primary Source Documents into Data
【2h】

Human Rights Texts: Converting Human Rights Primary Source Documents into Data

机译:人权文本:将人权主要原始文件转换为数据

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability.
机译:我们介绍并公开提供了大量数字化的主要来源人权文件,这些文件由包括大赦国际,人权观察,人权事务律师委员会和美国国务院在内的监督机构每年出版。除了数字化文本,我们还提供并描述文档术语矩阵,这些术语矩阵是按人权文档库中每个唯一术语系统地组织每个唯一文档的字数统计的数据集。为了说明该语料库的重要性,我们描述了人权社区中编码程序的发展以及由人类对语料库中的人权文件进行编码而创建的一些现有分类指标。然后,我们讨论如何将新的人权语料库和现有的人权数据集与各种统计分析和机器学习算法一起使用,以帮助学者了解人权实践和报告是如何随着时间演变的。最后,我们讨论了数据集维护,更新和可用性的计划。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号