首页> 外文会议>IEEE International Conference on Big Data >Augmenting Co-Training With Recommendations to Classify Human Rights Violations
【24h】

Augmenting Co-Training With Recommendations to Classify Human Rights Violations

机译:通过建议进行协同培训,以归类侵犯人权

获取原文

摘要

In the recent past, many human rights organizations have started using social media to identify, collect and document human rights violations. To manually extract relevant data from the large corpus of this social network data is difficult and time-consuming and expensive. Furthermore, with the advent of technology, the context and significance of the human rights abuses has and will change over time and advice from experts is needed to perform any kind quantitative analysis on this data. There are applications and systems that help structure this data into relevant categories, but detecting underlying latent patterns, finding similar annotated patterns and continuously upgrading the system to perform exploratory analysis requires high maintenance and cost. This paper proposes a solution to address this problem by integrating semi-supervised learning (with Matrix Factorization) and similarity measures algorithms to classify the large unstructured corpus into stories that have been labelled with one or more types of human rights abuses. In the last few decades, recommender systems have come across as powerful machine learning tools to infer from data and provide value-added content. Along the same context, semi-supervised algorithms mitigate situations where there is a relatively small labelled training data, but a large unlabeled data-set. This paper tries to combine both these algorithms to discover patterns in unlabeled victim survivor stories and recommends labels from other similar stories, thus updating the initial labelled set. The efficiency of the algorithm is evaluated using state of art evaluation metrics. Experimental results show a correlation between new and labelled stories. Real-world results show that the algorithm outplays some of in house recommendation algorithms.
机译:在最近的过去,许多人权组织已经开始使用社交媒体来识别,收集和记录侵犯人权行为。要从该社交网络的大语料中提取相关数据,难以耗时和昂贵。此外,随着技术的出现,人权滥用的背景和意义具有并将随着时间的推移而变化,并且需要对专家进行建议,以对该数据进行任何一种定量分析。存在有助于将此数据构建到相关类别中的应用程序和系统,但是检测到潜在的潜在模式,找到类似的注释模式并连续升级系统以进行探索性分析,需要高维护和成本。本文通过集成半监督学习(具有矩阵分解)和相似度测量算法来提出解决该问题的解决方案,将大型非结构化语料库分类为已标记为具有一个或多种类型的人权滥用的故事。在过去的几十年中,推荐系统已经遇到了强大的机器学习工具,可以从数据推断并提供增值内容。沿着相同的上下文,半监督算法缓解了有一个相对较小的标记训练数据的情况,而是一个大型未标记的数据集。本文试图将这些算法与未标记的受害者幸存者故事中的模式相结合,并建议其他类似故事的标签,从而更新初始标记的集合。使用艺术评估度量评估算法的效率。实验结果表明,新的和标记的故事之间的相关性。现实世界结果表明,该算法在家庭推荐算法中占据了一些。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号