Vocabulary Filtering for Term Weighting in Archived Question Search

机译：归档问题搜索中用于词汇加权的词汇过滤

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes the notion of vocabulary filtering in a term weighting framework that consists of three filters at the document level, collection level, and vocabulary level. While term frequency and document frequency along with their variations are respectively the dominant term weighting factors at the document level and collection level, vocabulary level factors are seldom considered in current models. In a way, stopword removal can be seen as a vocabulary level filter, but it is not well integrated into the current term-weighting models. In this paper, we propose a vocabulary filtering and multi-level term weighting model by integrating point-wise divergence based measure into the commonly used TF-IDF model. With our proposed model, the specificity of the vocabulary is captured as a new factor in term weighting, and stopwords are naturally handled within the model rather than being removed according to a separately constructed list. Experiments conducted on searching for similar questions in a large community-based question answering archive show that: (a)our proposed term weighting model with multiple levels is consistently better than those with single level for retrieval task; (b)the proposed vocabulary filter well distinguishes salient and trivial terms, and can be utilized to construct stopword lists.

机译：本文提出了术语加权框架中的词汇过滤概念，该术语加权框架由文档级别，集合级别和词汇级别的三个过滤器组成。虽然术语频率和文档频率及其变化分别是文档级别和收集级别的主要术语加权因子，但在当前模型中很少考虑词汇级别的因子。从某种意义上说，停用词删除可以看作是词汇量过滤器，但是它没有很好地集成到当前的术语加权模型中。在本文中，我们通过将基于点向散度的量度集成到常用的TF-IDF模型中，提出了词汇过滤和多级术语加权模型。使用我们提出的模型，词汇的特殊性被捕获为术语权重的新因素，停用词在模型中自然处理，而不是根据单独构造的列表将其删除。在基于社区的大型问答档案库中搜索相似问题的实验表明：（a）我们提出的多级术语加权模型始终优于单级检索任务; （b）拟议的词汇过滤器很好地区分了显着和琐碎的术语，并可用于构建停用词列表。

著录项

来源
《Pacific Asia conference on knowledge discovery and data mining;PAKDD 2010;Workshop on data mining for healthcare management;DMHM 2010;Pacific Asia workshop on intelligence and security informatics;PAISI 2010;Workshop on feature selection in data mining;FSDM 2010;Workshop on behavior informatics;BI 2010;Workshop on datamining and knowledge discover for e-governance;DMEG 2010;Workshop on knowledge discovery for rural systems;KDRS 2010;Workshop on emerging research trends in vehicle health management;VHM 2010》|2010年|p.383-390|共8页
会议地点
作者
Zhao-Yan Ming; Kai Wang; Tat-Seng Chua;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Vocabulary structure and its impact on term weighting [J] . Gron Leonie, Bertels Ann Terminology . 2018,第1期

机译：词汇结构及其对术语加权的影响
2. New Filtering Scheme Based on Term Weighting to Improve Object Based Opinion Mining on Tourism Product Reviews [J] . Ahimsa Denhas Afrizal, Nur Aini Rakhmawati, Aris Tjahyanto Procedia Computer Science . 2019,第1期

机译：基于术语权重的新过滤方案以改进基于对象的旅游产品评论意见挖掘
3. A novel Fuzzy-PSO term weighting automatic query expansion approach using combined semantic filtering [J] . Gupta Yogesh, Saini Ashish Knowledge-Based Systems . 2017,第Nova15期

机译：组合语义过滤的Fuzzy-PSO术语加权自动查询扩展方法
4. Vocabulary Filtering for Term Weighting in Archived Question Search [C] . Zhao-Yan Ming, Kai Wang, Tat-Seng Chua Pacific-Asia Conference on Knowledge Discovery and Data Mining . 2010

机译：在存档问题搜索中术语加权的词汇过滤
5. Seismic imaging using matched filters for operator weighting. [D] . Beckett, Jeffrey Karl. 2004

机译：使用匹配滤波器对操作员进行加权成像。
6. Focus on information retrieval: Search terms and a validated brief search filter to retrieve publications on health-related values in Medline: a word frequency analysis study [O] . Mila Petrova, Paul Sutcliffe, K W M (Bill) Fulford, 2012

机译：专注于信息检索：搜索词和经过验证的简短搜索过滤器以检索Medline中与健康相关的价值的出版物：单词频率分析研究
7. Why So Complicated? Simple Term Filtering and Weighting for Location-Based Bug Report Assignment Recommendation [O] . 2015

机译：为何如此复杂？基于位置的错误报告分配建议的简单术语过滤和加权
8. Frequency-Weighting Filter Selection, for H2 Control of Microgravity Isolation Systems: A Consideration of the 'Implicit Frequency Weighting' Problem [R] . Hampton, R. D. , Whorton, M. S. 1999

机译：用于微重力隔离系统的H2控制的频率加权滤波器选择：考虑“隐式频率加权”问题

Vocabulary Filtering for Term Weighting in Archived Question Search

摘要

著录项

相似文献

相关主题

期刊订阅