Semantic Information Detection of Webpage Based on Word Vector and Infomap

机译：基于词向量和信息图的网页语义信息检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

For Chinese web pages, we use regular expression and Viterbi algorithm to realize Chinese filtering and word segmentation, then use ngram2vec algorithm to get the word vector set of web page and pre train the word vector set of Baidu Encyclopedia. Baidu Encyclopedia word vector set is based on Infomap clustering algorithm to realize word vector Clustering and tagging types, training neural network through training data set and Baidu Encyclopedia corpus to determine the type of unknown web pages through neural network, and achieve the purpose of detecting the semantic information of unknown web pages. This algorithm is has few super parameters and high calculation efficiency. Experiments show that the accuracy of the trained neural network model reaches 96.73%, which can quickly and accurately identify the type of web page.

机译：对于中文网页，我们使用正则表达式和Viterbi算法来实现中文过滤和分词，然后使用ngram2vec算法获得网页的单词向量集，并预先训练百度百科的单词向量集。百度百科词向量集是基于Infomap聚类算法实现词向量的聚类和标记类型，通过训练数据集和百度百科语料库来训练神经网络，通过神经网络确定未知网页的类型，从而达到检测网页目的的目的。未知网页的语义信息。该算法超级参数少，计算效率高。实验表明，经过训练的神经网络模型的准确率达到96.73％，可以快速，准确地识别出网页的类型。

著录项

来源
《IEEE International Conference on Power, Intelligent Computing and Systems》|2020年|293-297|共5页
会议地点
作者
Yuqian Wang; Jianyou Lv;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Hidden Markov models; Clustering algorithms; Probability; Web pages; Encoding; Neural networks; Encyclopedias;

机译：隐马尔可夫模型;聚类算法;概率;网页;编码;神经网络;百科全书;

相似文献

外文文献
中文文献
专利

1. A Malicious Webpage Detection Algorithm Based on Image Semantics [J] . Li Xiangjun, Li Sifan, Liu Shengnan, Traitement du Signal . 2020,第1期

机译：一种基于图像语义的恶意网页检测算法
2. Context-sensitive and keyword density-based supervised machine learning techniques for malicious webpage detection [J] . Altay Betul, Dokeroglu Tansel, Cosar Ahmet Soft computing: A fusion of foundations, methodologies and applications . 2019,第12期

机译：基于背景和关键字的基于密度的受恶意网页检测的监督机器学习技术
3. Chinese WeChat and Blog Hot Words Detection Method Based on Chinese Semantic Clustering [J] . Wang Yu, Song Sixin, Zhou Fanfan, Intelligent automation and soft computing . 2017,第4期

机译：基于中文语义聚类的中文微信和博客热门词检测方法
4. How to make words with vectors: Phrase generation in distributional semantics [C] . Georgians Dinu, Marco Baroni Annual meeting of the Association for Computational Linguistics . 2014

机译：如何用向量制作单词：分布语义中的短语生成
5. Semantic edge detection: Intra- and inter-hemispheric processing of semantically ambiguous words. [D] . Hutchinson, Adele St. Martin. 2002

机译：语义边缘检测：语义不明确的单词在半球形内和半球形之间的处理。
6. Behavioral correlates of cortical semantic representations modeled by word vectors [O] . Satoshi Nishida, Antoine Blanc, Naoya Maeda, 2021

机译：单词向量建模的皮质语义表示的行为相关性
7. Learning New Facts From Knowledge Bases With Neural Tensor Networks and Semantic Word Vectors [O] . Chen, Danqi, Socher, Richard, Manning, Christopher D., 2013

机译：利用神经张量网络和知识从知识库中学习新事实语义词向量

Semantic Information Detection of Webpage Based on Word Vector and Infomap

摘要

著录项

相似文献

相关主题

期刊订阅