首页> 外文会议>IEEE International Conference on Big Data >Augmenting Co-Training With Recommendations to Classify Human Rights Violations
【24h】

Augmenting Co-Training With Recommendations to Classify Human Rights Violations

机译:通过对违反人权的分类建议进行联合培训

获取原文

摘要

In the recent past, many human rights organizations have started using social media to identify, collect and document human rights violations. To manually extract relevant data from the large corpus of this social network data is difficult and time-consuming and expensive. Furthermore, with the advent of technology, the context and significance of the human rights abuses has and will change over time and advice from experts is needed to perform any kind quantitative analysis on this data. There are applications and systems that help structure this data into relevant categories, but detecting underlying latent patterns, finding similar annotated patterns and continuously upgrading the system to perform exploratory analysis requires high maintenance and cost. This paper proposes a solution to address this problem by integrating semi-supervised learning (with Matrix Factorization) and similarity measures algorithms to classify the large unstructured corpus into stories that have been labelled with one or more types of human rights abuses. In the last few decades, recommender systems have come across as powerful machine learning tools to infer from data and provide value-added content. Along the same context, semi-supervised algorithms mitigate situations where there is a relatively small labelled training data, but a large unlabeled data-set. This paper tries to combine both these algorithms to discover patterns in unlabeled victim survivor stories and recommends labels from other similar stories, thus updating the initial labelled set. The efficiency of the algorithm is evaluated using state of art evaluation metrics. Experimental results show a correlation between new and labelled stories. Real-world results show that the algorithm outplays some of in house recommendation algorithms.
机译:最近,许多人权组织已经开始使用社交媒体来识别,收集和记录侵犯人权行为。从该社交网络数据的大型语料库中手动提取相关数据是困难且耗时且昂贵的。此外,随着技术的出现,侵犯人权的背景和重要性已经并且将会随着时间而改变,并且需要专家的意见才能对这种数据进行任何形式的定量分析。有一些应用程序和系统可以帮助将这些数据组织到相关的类别中,但是要检测潜在的潜在模式,找到相似的带注释的模式并不断升级系统以进行探索性分析,则需要很高的维护和成本。本文提出了一种解决方案,通过集成半监督学习(具有矩阵分解)和相似性度量算法来将大型非结构化语料库分类为带有一种或多种侵犯人权行为的故事。在过去的几十年中,推荐系统已经成为功能强大的机器学习工具,可以根据数据进行推断并提供增值内容。在相同的情况下,半监督算法可缓解标记的训练数据相对较小而未标记的数据集较大的情况。本文试图将这两种算法结合起来,以发现未标记受害者幸存者故事中的模式,并从其他类似故事中推荐标签,从而更新初始标记集。使用最新的评估指标评估算法的效率。实验结果表明,新故事和带有标签的故事之间存在相关性。实际结果表明,该算法胜过某些内部推荐算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号