Methodologies of Internet portals users' short messages texts authorship identification based on the methods of mathematical linguistics

机译：基于数学语言学方法的互联网门户网站用户短信文本文本的方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The article deals with the peculiarities of Internet portals, blogs and websites short messages texts authorship determination. The article focuses on possibility to search people who have several different accounts and send messages from them. Sentences dependence on the number of words in portals users' comments is represented. The model of Internet portal text message is provided. Method of Internet portals users' short messages texts authorship identification based on the naive Bayesian classifier is represented. The specific feature of the proposed method is not only frequency dictionary analysis based on messages selection to identify users, but their usage of rules and connections on the base of language syntactic information. The parts of speech frequency and connection frequency between parts of speech are given. The communication graph of parts of speech connections of limited natural language in commentaries is represented. Linguistic characteristics used to identify portal user are given. Structures are distinguished on the base of the communication graph between parts of speech as regards noun prepositional casal form of limited natural language used to identify text authorship. The experiment showing achievable indicators of Internet portal user identification probability depending on training sample is carried out. Probability diagrams of authorship identification based on selected characteristics are represented.

机译：文章涉及互联网门户，博客和网站短信文本的特点。这篇文章侧重于搜索有几个不同账户的人员并从中发送消息。句子依赖于门户网站用户评论中的单词数量。提供了互联网门户文本消息的模型。互联网门户网站的方法用户的短消息表示基于Naive Bayesian分类器的作者身份识别。所提出的方法的具体特征不仅是基于消息选择来识别用户的频率字典分析，而是它们对语言语法信息基础上的规则和连接的使用。给出了语音频率和组件之间的言语的部分。代表评论中有限自然语言的语音连接部分的通信图。给出了用于识别门户用户的语言特征。在用于识别文本作者的有限自然语言的Noun介词座形式的语音部分之间的通信图之间的基础上区分了结构。执行根据训练样本的互联网门户用户识别概率的可实现指标的实验。基于所选特征的作者识别概率图是表示的。

著录项

来源
《IEEE International Conference on Application of Information and Communication Technologies》|2014年||共6页
会议地点
作者
Milhail Sukhoparov; Ilya Lebedev;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类通信;
关键词
Bayes methods; Internet; Web sites; graph theory; natural language processing; pattern classification; portals; text analysis; Internet portal text message; Internet portals user short message text authorship identification; Websites; blogs; commentaries; communication graph; connection frequency; frequency dictionary analysis; language syntactic information; limited natural language; linguistic characteristics; mathematical linguistics; message selection; naive Bayesian classifier; noun prepositional casal form; parts of speech connections; people searching; portal user comment; portal user identification; probability diagrams; short message text authorship determination; speech frequency; Bayes methods; Internet; Natural languages; Portals; Pragmatics; Speech; Training; Authorship identification; Bayesian classifier; text information classification;

机译：贝叶斯方法;网站;图论;自然语言处理;模式分类;门户;文本分析;互联网门户网站短信;网站用户短消息文本作者身份识别;网站;博客;博客;博客;沟通图;连接频率;频率字典分析;语言句法信息;有限的自然语言;语言特征;数学语言学;留言选择;天真贝叶斯分类器;名词介词套装;言语的部分;人们搜索;门户网站用户识别;概率图;概率图;短消息;短消息文本作者决心;言语频率;贝叶斯方法;互联网;自然语言;门户;语用学;讲话;培训;作者身份证明;贝叶斯分类器;文本信息分类;

相似文献

外文文献
中文文献
专利

1. Mechanism of Establishing Authorship of Short Messages Posted by Users of Internet Portals by Methods of Mathematical Linguistics [J] . M. E. Sukhoparov Automatic Control and Computer Sciences . 2015,第8期

机译：数学语言学方法建立互联网门户用户发布短消息作者的机制
2. A cluster analysis of text message users based on their demand for text messaging: A behavioral economic approach [J] . Hayashi Yusuke, Friedel Jonathan E., Foreman Anne M., Journal of the experimental analysis of behavior . 2019,第3期

机译：基于他们对文本消息的需求的文本消息用户的集群分析：行为经济方法
3. Semi-literate Texting (SLT): Survey based text message dataset from digitally semi-literate users in India [J] . Prawaal Sharma, Navneet Goyal, Vinay MR Data in Brief . 2021,第a期

机译：半识字短信（SLT）：基于对印度数字半识字用户的教科消息数据集
4. Methodologies of Internet portals users' short messages texts authorship identification based on the methods of mathematical linguistics [C] . Milhail Sukhoparov, Ilya Lebedev IEEE International Conference on Application of Information and Communication Technologies . 2014

机译：基于数学语言学方法的互联网门户用户短消息文本作者身份识别方法
5. A methodological framework for automated support to text-based identification of system model elements. [D] . Park, Sooyong. 1995

机译：自动支持基于文本的系统模型元素标识的方法框架。
6. Methods system errors and demographic differences in participant errors using daily text message-based short message service computer-assisted self-interview (SMS-CASI) to measure sexual risk behavior in a RCT of HIV self-test use [O] . William Brown III, Alan Sheinfil, Javier Lopez-Rios, 2019

机译：使用每日基于短信的短信服务计算机辅助自我访谈（SMS-CASI）来测量HIV自测使用的RCT中的性风险行为的方法系统错误和参与者错误的人口统计学差异
7. Evaluation of the Performance and Efficiency of the Automated Linguistic Features for Author Identification in Short Text Messages Using Different Variable Selection Techniques [O] . Refat Aljumily 2018

机译：评估使用不同的变量选择技术的简短文本消息中自动语言特征的性能和效率

Methodologies of Internet portals users' short messages texts authorship identification based on the methods of mathematical linguistics

摘要

著录项

相似文献

相关主题

期刊订阅