首页> 外文会议>IEEE International Conference on Application of Information and Communication Technologies >Methodologies of Internet portals users' short messages texts authorship identification based on the methods of mathematical linguistics
【24h】

Methodologies of Internet portals users' short messages texts authorship identification based on the methods of mathematical linguistics

机译:基于数学语言学方法的互联网门户网站用户短信文本文本的方法

获取原文

摘要

The article deals with the peculiarities of Internet portals, blogs and websites short messages texts authorship determination. The article focuses on possibility to search people who have several different accounts and send messages from them. Sentences dependence on the number of words in portals users' comments is represented. The model of Internet portal text message is provided. Method of Internet portals users' short messages texts authorship identification based on the naive Bayesian classifier is represented. The specific feature of the proposed method is not only frequency dictionary analysis based on messages selection to identify users, but their usage of rules and connections on the base of language syntactic information. The parts of speech frequency and connection frequency between parts of speech are given. The communication graph of parts of speech connections of limited natural language in commentaries is represented. Linguistic characteristics used to identify portal user are given. Structures are distinguished on the base of the communication graph between parts of speech as regards noun prepositional casal form of limited natural language used to identify text authorship. The experiment showing achievable indicators of Internet portal user identification probability depending on training sample is carried out. Probability diagrams of authorship identification based on selected characteristics are represented.
机译:文章涉及互联网门户,博客和网站短信文本的特点。这篇文章侧重于搜索有几个不同账户的人员并从中发送消息。句子依赖于门户网站用户评论中的单词数量。提供了互联网门户文本消息的模型。互联网门户网站的方法用户的短消息表示基于Naive Bayesian分类器的作者身份识别。所提出的方法的具体特征不仅是基于消息选择来识别用户的频率字典分析,而是它们对语言语法信息基础上的规则和连接的使用。给出了语音频率和组件之间的言语的部分。代表评论中有限自然语言的语音连接部分的通信图。给出了用于识别门户用户的语言特征。在用于识别文本作者的有限自然语言的Noun介词座形式的语音部分之间的通信图之间的基础上区分了结构。执行根据训练样本的互联网门户用户识别概率的可实现指标的实验。基于所选特征的作者识别概率图是表示的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号