首页> 外文会议>IEEE International Conference on Application of Information and Communication Technologies >Methodologies of Internet portals users' short messages texts authorship identification based on the methods of mathematical linguistics
【24h】

Methodologies of Internet portals users' short messages texts authorship identification based on the methods of mathematical linguistics

机译:基于数学语言学方法的互联网门户用户短消息文本作者身份识别方法

获取原文

摘要

The article deals with the peculiarities of Internet portals, blogs and websites short messages texts authorship determination. The article focuses on possibility to search people who have several different accounts and send messages from them. Sentences dependence on the number of words in portals users' comments is represented. The model of Internet portal text message is provided. Method of Internet portals users' short messages texts authorship identification based on the naive Bayesian classifier is represented. The specific feature of the proposed method is not only frequency dictionary analysis based on messages selection to identify users, but their usage of rules and connections on the base of language syntactic information. The parts of speech frequency and connection frequency between parts of speech are given. The communication graph of parts of speech connections of limited natural language in commentaries is represented. Linguistic characteristics used to identify portal user are given. Structures are distinguished on the base of the communication graph between parts of speech as regards noun prepositional casal form of limited natural language used to identify text authorship. The experiment showing achievable indicators of Internet portal user identification probability depending on training sample is carried out. Probability diagrams of authorship identification based on selected characteristics are represented.
机译:本文讨论了Internet门户,博客和网站短消息文本作者身份确定的特性。本文重点探讨了搜索具有多个不同帐户的人员并从中发送消息的可能性。表示了依赖于门户网站用户评论中的单词数的句子。提供了Internet门户文本消息的模型。提出了基于朴素贝叶斯分类器的互联网门户用户短信文本作者身份识别方法。所提出的方法的特点不仅是基于基于消息选择来识别用户的频率字典分析,而且是基于语言句法信息的规则和联系的使用。给出了语音频率的部分以及语音部分之间的连接频率。在评论中表示了有限自然语言的语音连接的部分的通信图。给出了用于识别门户网站用户的语言特征。根据词性之间的交流图,根据用于识别文本作者身份的有限自然语言的名词介词casal形式区分结构。进行了根据培训样本显示可实现的Internet门户用户识别概率指标的实验。表示了基于所选特征的作者身份标识的概率图。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号