多文档文摘语义单元自动去噪器的监督学习方法

龚书; 瞿有利; 田盛丰

首页> 中文期刊>计算机研究与发展 >多文档文摘语义单元自动去噪器的监督学习方法

多文档文摘语义单元自动去噪器的监督学习方法

开具论文收录证明 >>

期刊封面封底目录下载 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

多文档文摘的处理对象是存在噪音的文档集.现有文摘系统一般使用由人工设定阈值的固定阈值去噪器.但通过实验可见,不同文摘算法本身的抗噪能力各有高低,最优阈值随文档集、文摘算法、文本表示方法而改变,人工设定的固定阈值无法达到较好的通用性和去噪效果.为此,提出一种用于生成自动去噪器的监督学习方法,通过从人工文摘中自动获得标注信息,为语义单元提取多个特征,训练语义单元分类器而构成自动去噪器.可通用于不同文本表示所生成的语义单元,在不同多文档文摘系统的预处理阶段为任意文档集自动去除噪音语义单元.实验表明,该监督学习方法所生成的自动去噪器在不同文档集、文摘算法和文本表示方法下具有通用性,较好的去噪性能使各文摘算法的速度及所提取文摘的质量得到不同程度的提升.%The target of multi-document summarization is a document set containing many noises. Most of the state-of-art summarization systems use fixed threshold-based noise filter with a manually selected threshold to filter out low frequency units. But according to the observation in experiments, the best threshold varies according to different document sets, summarization algorithms and text representations. These mean that a fixed threshold-based noise filter cannot achieve good robustness in different summarization settings which will lead to an unstable noise filtering efficiency. Therefore, a supervised learning method to generate automatic noise filter is proposed. Based on the labels extracted automatically from human written summaries and a set of selected features which can be used for different types of semantic units, a semantic unit classifier is trained to compose the automatic noise filter, which can be used for different types of semantic unit generated by different text representation methods, and can automatically filter out noisy semantic units at the preprocessing stage of multi-document summarization systems. Experiments show the robustness of the automatic noise filter generated by the supervised learning method under different document sets, summarization algorithms and text representations, and also show the improvements in the speed and summary quality of each summarization algorithms benefited from noise filtering.

著录项

来源
《计算机研究与发展》|2013年第4期|873-882|共10页
作者
龚书; 瞿有利; 田盛丰;
展开▼
作者单位

北京交通大学计算机与信息技术学院北京 100044;

展开▼
原文格式 PDF
正文语种 chi
中图分类信息处理（信息加工）;
关键词
自动去噪; 监督学习; 多文档文摘; 文本表示; 预处理;
入库时间 2022-08-18 04:47:08

相似文献

中文文献
外文文献
专利

1. 一种有效的多文档文摘语义空间降维方法 [J] . 张先飞 ,刘嵩 ,韩永峰 . 情报学报 . 2011,第003期
2. 基于维基语义的多文档文摘研究 [J] . 龚书 ,瞿有利 ,田盛丰 . 南京大学学报：自然科学版 . 2011,第4期
3. 基于有监督学习方法的多文档文本情感摘要 [J] . 李艳翠 ,林莉媛 ,周国栋 . 中文信息学报 . 2014,第006期
4. 基于Siamese LSTM的中文多文档自动文摘模型 [J] . 龚永罡 ,王嘉欣 ,廉小亲 . 计算机应用与软件 . 2021,第003期
5. 基于主题模型与冗余控制的中文多文档自动文摘技术研究 [J] . 袁龙云 ,张琳 . 现代计算机（专业版） . 2017,第014期
6. 多文档文摘中基于语义相似度的最大边缘相关技术研究 [C] . 刘寒磊 ,关毅 ,徐永东 . 全国第八届计算语言学联合学术会议 . 2005
7. 基于语义聚类的新闻多文档自动文摘 [A] . 王帆 . 2017

多文档文摘语义单元自动去噪器的监督学习方法

摘要

著录项

相似文献

相关主题

期刊订阅