An automatic filtering method for field association words by deleting unnecessary words

E. GHADA; E.-S. ATLAM; M. FUKETA; K. MORITA; J.-I. AOE

首页> 外文期刊>International journal of computer mathematics >An automatic filtering method for field association words by deleting unnecessary words

【24h】

An automatic filtering method for field association words by deleting unnecessary words

机译：通过删除不必要的单词来自动过滤字段关联单词的方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Document classification and summarization are very important for document text retrieval. Generally, humans can recognize fields such as (Sports) or (Politics) based on specific words called Field Association (FA) words in those document fields. The traditional method causes misleading redundant words (unnecessary words) to be registered because the quality of the resulting FA words depends on learning data pre-classified by hand. Therefore recall and precision of document classification are degraded if the classified fields classified by hand are ambiguous. We propose two criteria: deleting unnecessary words with low frequencies, and deleting unnecessary words using category information. Moreover, using the proposed criteria unnecessary words can be deleted from the FA words dictionary created by the traditional method. Experimental results showed that 25% of 38 372 FA word candidates were identified as unnecessary and deleted automatically when the presented method was used. Furthermore, precision and F-measure were improved by 26% and 15%, respectively, compared with the traditional method.

机译：文档分类和摘要对于文档文本检索非常重要。通常，人们可以基于那些文档字段中称为“字段关联”（FA）单词的特定单词来识别诸如（体育）或（政治）之类的字段。传统方法会导致误导性冗余词（不必要的词）被注册，因为生成的FA词的质量取决于手工预先分类的学习数据。因此，如果手工分类的分类字段不明确，则会降低文档分类的查全率和准确性。我们提出两个标准：删除低频不必要的单词，以及使用类别信息删除不必要的单词。而且，使用所提出的标准，可以从通过传统方法创建的FA单词词典中删除不必要的单词。实验结果表明，在使用该方法时，38 372个FA单词候选中的25％被识别为不必要，并被自动删除。此外，与传统方法相比，精度和F量度分别提高了26％和15％。

著录项

来源
《International journal of computer mathematics》 |2006年第3期|p.247-261|共15页
作者
E. GHADA; E.-S. ATLAM; M. FUKETA; K. MORITA; J.-I. AOE;
展开▼
作者单位

Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类运筹学;
关键词
field association words; document classification; unnecessary words; precision; F-measure;

机译：领域关联词;文档分类;不必要词;精度;F-测度;
入库时间 2022-08-18 03:01:58

相似文献

外文文献
中文文献
专利

1. Automatic Detection of Words Associations in Texts Based on Joint Distribution of Words Occurrences [J] . Santoni Daniele, Pourabbas Elaheh Computational Intelligence . 2016,第4期

机译：基于单词出现联合分布的文本中单词联想自动检测
2. UPDATING FIELD ASSOCIATION WORD DICTIONARY USING WORD ATTRIBUTES, MORPHOLOGICAL ANALYSIS, AND COMPOUND WORDS [J] . El-Sayed Atlam International Journal of Innovative Computing Information and Control . 2014,第6期

机译：使用词属性，词法分析和复合词来更新字段关联词词典
3. A document classification method by using field association words [J] . Fuketa M., Tsuji T., Okada M., Information Sciences: An International Journal . 2000,第1a4期

机译：利用字段关联词的文件分类方法
4. A new method for construction field association terms using Co-occurrence words and declinable words information [C] . M. Fuketa, S. Kashiji, H. Nakata, IEEE Interantional Conference on Systems, Man and Cybernetics . 2002

机译：一种使用共同发生词和可拒绝单词信息的建筑场协会条款的一种新方法
5. The analysis of title words as document contents indicators: Development of an informetrica method and application to the field of Drawing and Art Education. [D] . Maaswinkel, Antonius Peter. 1999

机译：标题词作为文档内容指标的分析：一种信息计量方法的开发及其在绘画艺术教育领域的应用。
6. The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction [O] . Elham Najafi, Amir H. Darooneh -1

机译：文本中词的分形模式：一种自动关键词提取方法
7. The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction. [O] . Elham Najafi, Amir H Darooneh 2015

机译：文本中单词的分形模式：一种自动关键词提取方法。

An automatic filtering method for field association words by deleting unnecessary words

摘要

著录项

相似文献

相关主题

期刊订阅