不良文本变体关键词识别的词汇串相似度计算

李少卿; 吴承荣; 曾剑平; 钟亦平

首页> 中文期刊> 《计算机应用与软件》 >不良文本变体关键词识别的词汇串相似度计算

不良文本变体关键词识别的词汇串相似度计算

AI论文写作 >>

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

With the development of Internet technology,there are various network applications of textual communication,such as chat rooms,BBS and so on.In order to maintain the healthy development of network environment,many applications usually filter the profanities posted by users.To avoid being filtered,some of malicious users often disguise these profanities in their information posted.How to recognise these disguised profanities is an important issue.In this paper we present an algorithm to recognise these disguised profanities by computing the string similarity of aberrant sensitive words.This algorithm has the following features:(1)the score for string similarity of disguised profanities given by this algorithm is very close to the one by human brain;(2)very low time complexity;(3)very high identification rate about disguised profanities.The algorithm determines whether to filter the suspected sensitive words or not according to the calculated similarity values.Data of experiment show that this algorithm outperforms the state-of-the-art metric of string similarity for newly coined profanities.%随着网络技术的发展，网络空间出现了各种各样的文本交流类网络应用，如聊天室、BBS 等。为维护网络环境的文明，这些网络应用中会将用户发表的“脏话”词汇进行过滤。有些恶意用户为了避免所发信息被系统过滤，经常会将“脏话”词汇进行变形处理，如何识别这些变形后的“脏话”词汇，是一个重要的问题。通过计算变异敏感词汇相似度，来对变形词汇进行识别。该方法具有如下特点：（1）计算结果接近于人脑识别的结果；（2）计算所用的时间复杂度较低；（3）对变体识别率较高。根据计算的相似度值，来决定是否对该疑似敏感词进行过滤。实验数据表明，所提出的相似度计算方法好于现有的算法。

著录项

来源
《计算机应用与软件》 |2015年第3期|151-157|共7页
作者
李少卿; 吴承荣; 曾剑平; 钟亦平;
展开▼
作者单位

复旦大学计算机科学技术学院上海 200433;

复旦大学计算机科学技术学院上海 200433;

复旦大学计算机科学技术学院上海 200433;

复旦大学计算机科学技术学院上海 200433;

展开▼
原文格式 PDF
正文语种 chi
中图分类算法理论;
关键词
变体; 字符串相似度; 算法; 编辑距离; 内容过滤;

相似文献

中文文献
外文文献
专利

1. 学术文本词汇功能识别——在关键词自动抽取中的应用 [J] . 姜艺 ,黄永 ,夏义堃 . 情报学报 . 2021,第002期
2. 学术文本词汇功能识别——基于BERT向量化表示的关键词自动分类研究 [J] . 陆伟 ,李鹏程 ,张国标 . 情报学报 . 2020,第012期
3. 基于词性和关键词的短文本相似度计算方法 [J] . 赵明月 . 计算机时代 . 2018,第005期
4. 基于词汇语义信息的文本相似度计算 [J] . 谷重阳 ,徐浩煜 ,周晗 . 计算机应用研究 . 2018,第002期
5. 极性相似度计算在词汇倾向性识别中的应用 [J] . 宋乐 ,何婷婷 ,王倩 . 中文信息学报 . 2010,第004期
6. 面向嵌入式应用的小词汇量语音串识别系统 [C] . 王欢良 ,韩纪庆 ,李海峰 . 第七届全国人机语音通讯学术会议 . 2003
7. 不良短文本变体的识别 [A] . 肖观腾 . 2019

不良文本变体关键词识别的词汇串相似度计算

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅