Hidden Markov-based LDA Internet Sensitive Information Text Filtering

机译：基于隐马尔可夫的LDA Internet敏感信息文本过滤

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We are in an era of rapid development of Internet information [1], generating billions of text every day. Under the test of such a “digital torrent”, how to ensure the ecological security and healthy development of the Internet has become a technical challenge. Rocchio[2] put forward a linear classifier which is a classification algorithm based on linear vector space model. With the development of hardware devices, the machine learning model has become the mainstream. The main training method is linear regression [3], K-Nearest Neighbor[4], Neural Network Model[5] and Support Vector Machine[6]. This paper raises a hidden Markov model based on feature keywords-themes. This is a statistically based approach. We use the Textrank algorithm [7] to extract feature words from a large number of data sets. Using the Apriori algorithm [8] to quantify the implicit relationship between feature words, we can generate a feature word confidence level matrix and establish a HMM-LDA correlation model. According to the text document which can produce an associated state matrix and a probability state transition matrix, we can confirm the conversion probability of the visible state chain. Thus we can filter and identify sensitive Internet information text.(Abstract)

机译：我们正处于互联网信息迅速发展的时代[1]，每天生成数十亿条文本。在这样的“数字洪流”的考验下，如何确保互联网的生态安全和健康发展已成为一项技术挑战。 Rocchio [2]提出了一种线性分类器，它是一种基于线性向量空间模型的分类算法。随着硬件设备的发展，机器学习模型已经成为主流。主要的训练方法是线性回归[3]，K最近邻[4]，神经网络模型[5]和支持向量机[6]。本文提出了一种基于特征关键词-主题的隐马尔可夫模型。这是一种基于统计的方法。我们使用Textrank算法[7]从大量数据集中提取特征词。使用Apriori算法[8]量化特征词之间的隐式关系，我们可以生成特征词置信度矩阵并建立HMM-LDA相关模型。根据可以产生关联状态矩阵和概率状态转移矩阵的文本文件，我们可以确定可见状态链的转换概率。这样我们就可以过滤和识别敏感的Internet信息文本。（摘要）

著录项

来源
《》|2020年|1-6|共6页
会议地点
作者
Haoze Yu; Guidong Zhang; Yongjun Shen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
component; Numerical Analysis; Matrix model; Algorithm design and analysis; Content Analysis and Indexing; Word processing(keywords);

机译：组件;数值分析;矩阵模型;算法设计与分析;内容分析与索引;文字处理（关键词）;

相似文献

外文文献
中文文献
专利

1. Text Categorization for Internet Content Filtering [J] . José M. Gómez, Ignacio Giráldez, Manuel de Buenaga Inteligencia Artificial : Ibero-American Journal of Artificial Intelligence . 2004,第22期

机译：用于Internet内容过滤的文本分类
2. Fast convergence identification of hidden Markov models using risk-sensitive filters [J] . Thorne JS., Moore JB. Nonlinear Analysis: An International Multidisciplinary Journal . 2001,第4期

机译：使用风险敏感过滤器快速隐藏Markov模型的收敛识别
3. RISK-SENSITIVE FILTERING AND SMOOTHING FOR HIDDEN MARKOV MODELS [J] . Dey S., Moore JB. Systems and Control Letters . 1995,第5期

机译：隐马尔可夫模型的风险敏感滤波和平滑
4. Filtering Spam Text Messages by Using Twitter-LDA Algorithm [C] . Dani Gunawan, Romi Fadillah Rahmat, Arsandi Putra, IEEE International Conference on Communication, Networks and Satellite . 2018

机译：使用Twitter-LDA算法过滤垃圾短信
5. The Internet filtering dilemma: A qualitative analysis of the beliefs, themes, and patterns associated with Internet filtering in Kansas K--12 schools. [D] . Brown, Ken. 2004

机译：互联网过滤难题：对堪萨斯州K--12学校与互联网过滤相关的信念，主题和模式的定性分析。
6. Recommendations for Performing Internet-Based Research on Sensitive Subject Matter with Hidden or Difficult-to-Reach Populations [O] . HUGH KLEIN, THOMAS P. LAMBING, DAVID A. MOSKOWITZ, -1

机译：用隐藏或难以达到群体对敏感主题进行基于互联网的研究的建议
7. Hidden Markov model based Finnish text-to-speech system utilizing glottal inverse filtering [O] . Raitio Tuomo 2008

机译：基于声门逆滤波的基于隐马尔可夫模型的芬兰文本语音系统

Hidden Markov-based LDA Internet Sensitive Information Text Filtering

摘要

著录项

相似文献

相关主题

期刊订阅