首页> 外文期刊>User modeling and user-adapted interaction >Domain-based Latent Personal Analysis and its use for impersonation detection in social media
【24h】

Domain-based Latent Personal Analysis and its use for impersonation detection in social media

机译:基于域的潜在个人分析及其在社交媒体上的模拟检测

获取原文
获取原文并翻译 | 示例

摘要

Zipf's law defines an inverse proportion between a word's ranking in a given corpus and its frequency in it, roughly dividing the vocabulary into frequent words and infrequent ones. Here, we stipulate that within a domain an author's signature can be derived from, in loose terms, the author's missing popular words and frequently used infrequent words. We devise a method, termed Latent Personal Analysis (LPA), for finding domain-based attributes for entities in a domain: their distance from the domain and their signature, which determines how they most differ from a domain. We identify the most suitable distance metric for the method among several and construct the distances and personal signatures for authors, the domain's entities. The signature consists of both over-used terms (compared to the average) and missing popular terms. We validate the correctness and power of the signatures in identifying users and set existence conditions. We test LPA in several domains, both textual and non-textual. We then demonstrate the use of the method in explainable authorship attribution: we define algorithms that utilize LPA to identify two types of impersonation in social media: (1) authors with sockpuppets (multiple) accounts and (2) front-users accounts, operated by several authors. We validate the algorithms and employ them over a large-scale dataset obtained from a social media site with over 4000 users. We corroborate these results using temporal rate analysis. LPA can further be used to devise personal attributes in a wide range of scientific domains in which the constituents have a long-tail distribution of elements.
机译:ZIPF的法律定义了在给定的语料库中的单词排名之间的反向比例,并且大致将词汇量划分为频繁的单词和不常见的词汇。在这里,我们规定,在域中,提交人的签名可以从松散的术语中派生作者丢失的流行词语,并且经常使用不频繁的单词。我们设计了一种定期的个人分析(LPA)的方法,用于查找域中实体的基于域的属性:它们与域的距离和其签名,这决定了它们与域最差的距离。我们识别若干方法中最合适的距离度量,并构建作者的距离和个人签名,域的实体。签名包括过度使用的术语(与平均值相比)和缺少流行术语。我们验证识别用户和设置存在条件中的签名的正确性和功率。我们在几个域中测试LPA,都是文本和非文本。然后,我们展示了该方法在解释的作者归属中的使用:我们定义了利用LPA的算法来识别社交媒体中的两种类型的模拟:(1)具有型号的作者(多个)帐户和(2)前用户帐户,由此操作几位作者。我们验证算法,并在从社交媒体站点获得的大规模数据集中使用超过4000用户。我们使用时间率分析来证实这些结果。 LPA可以进一步用于设计各种科学域中的个人属性,其中成分具有元素的长尾分布。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号