首页> 外文期刊>Engineering Applications of Artificial Intelligence >A feature selection method for author identification in interactive communications based on supervised learning and language typicality
【24h】

A feature selection method for author identification in interactive communications based on supervised learning and language typicality

机译:基于监督学习和语言典型性的交互式通信作者识别特征选择方法

获取原文
获取原文并翻译 | 示例

摘要

Authorship attribution, conceived as the identification of the origin of a text between different authors, has been a very active area of research in the scientific community mainly supported by advances in Natural Language Processing (NIP), machine learning and Computational Intelligence. This paradigm has been mostly addressed from a literary perspective, aiming at identifying the stylometric features and writeprints which unequivocally typify the writer patterns and allow their unique identification. On the other hand, the upsurge of social networking platforms and interactive messaging have undoubtedly made the anonymous expression of feelings, the sharing of experiences and social relationships much easier than in other traditional communication media. Unfortunately, the popularity of such communities and the virtual identification of their users deploy a rich substrate for cybercrimes against unsuspecting victims and other forms of illegal uses of social networks that call for the activity tracing of accounts. In the context of one-to-one communications this manuscript postulates the identification of the sender of a message as a useful approach to detect impersonation attacks in interactive communication scenarios. In particular this work proposes to select linguistic features extracted from messages via NLP techniques by means of a novel feature selection algorithm based on the dissociation between essential traits of the sender and receiver influences. The performance and computational efficiency of different supervised learning models when incorporating the proposed feature selection method is shown to be promising with real SMS data in terms of identification accuracy, and paves the way towards future research lines focused on applying the concept of language typicality in the discourse analysis field.
机译:作者身份归因被认为是不同作者之间文本起源的识别,一直是科学界非常活跃的研究领域,主要受到自然语言处理(NIP),机器学习和计算智能方面的支持。这种范式主要是从文学角度出发的,目的是识别风格特征和书写图案,这些特征和书写图案明确地代表了作者模式并允许对其进行独特的识别。另一方面,社交网络平台和交互式消息传递的兴起无疑使匿名表达情感,分享经验和社会关系比其他传统通信媒体要容易得多。不幸的是,此类社区的普及及其用户的虚拟身份为网络犯罪提供了丰富的基础,以防止毫无戒心的受害者以及对社交网络的其他形式的非法使用,这些行为要求对帐户进行活动追踪。在一对一通信的上下文中,此手稿假定对消息的发件人进行标识,这是在交互式通信方案中检测模拟攻击的有用方法。尤其是,这项工作提出了一种新的特征选择算法,即基于发送者和接收者影响的本质特征之间的分离,通过NLP技术从消息中提取语言特征。当结合提出的特征选择方法时,不同的监督学习模型的性能和计算效率在识别准确度方面具有真实的SMS数据的潜力,被证明是有希望的,并为将来集中于语言典型性概念的研究铺平了道路话语分析领域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号