Keyword Extraction Based on tf/idf for Chinese News Document

LI Juanzi; FAN Qina; ZHANG Kuo

首页> 外文期刊>Wuhan University Journal of Natural Sciences >Keyword Extraction Based on tf/idf for Chinese News Document

【24h】

Keyword Extraction Based on tf/idf for Chinese News Document

机译：基于tf / idf的中文新闻文献关键词提取

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Keyword extraction is an important research topic ofinformation retrieval. This paper gave the specification of keywords in Chinese news documents based on analyzing linguisticcharacteristics of news documents and then proposed a new keyword extraction method based on tf/idf with multi-strategies. Theapproach selected candidate keywords of uni-, bi- and tri- grams,and then defines the features according to their morphologicalcharacters and context information. Moreover, the paper proposedseveral strategies to amend the incomplete words gotten from theword segmentation and found unknown potential keywords innews documents. Experimental results show that our proposedmethod can significantly outperform the baseline method. We alsoapplied it to retrospective event detection. Experimental resultsshow that the accuracy and efficiency of news retrospective eventdetection can be significantly improved.

机译：关键字提取是信息检索的重要研究课题。在分析新闻文档语言特性的基础上，给出了中文新闻文档中关键词的规范，并提出了一种基于tf / idf多策略关键词提取方法。该方法选择了单词，双词和三词的候选关键词，然后根据其形态特征和上下文信息定义特征。此外，本文提出了几种策略来修正从分词中得到的不完整词，并发现了潜在的未知关键词。实验结果表明，我们提出的方法可以明显优于基线方法。我们还将其应用于追溯事件检测。实验结果表明，新闻回顾事件检测的准确性和效率可以大大提高。

著录项

来源
《Wuhan University Journal of Natural Sciences》 |2007年第5期|共5页
作者
LI Juanzi; FAN Qina; ZHANG Kuo;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自然科学总论;
关键词
keyword extraction; keyphrase extraction; news keyword;

机译：关键词提取;关键词提取;新闻关键词;

相似文献

外文文献
中文文献
专利

1. Keyword Extraction Based on tf/idf for Chinese News Document [J] . LI Juanzi, FAN Qina, ZHANG Kuo Wuhan University Journal of Natural Sciences . 2007,第5期

机译：基于tf / idf的中文新闻文献关键词提取
2. Keyword Extraction from Scientific Research Projects Based on SRP-TF-IDF [J] . WANG Zhuohao, WANG Dong, LI Qing 电子学报：英文版 . 2021,第004期

机译：基于SRP-TF-IDF的科学研究项目关键词提取
3. Classification of Sindhi Headline News Documents based on TF-IDF Text Analysis Scheme [J] . Irfan Ali Kandhro, Sahar Zafar Jumani, Ajab Ali Lashari, Indian Journal of Science and Technology . 2019,第33期

机译：基于TF-IDF文本分析方案的信德头条新闻文件分类
4. News keywords extraction algorithm based on TextRank and classified TF-IDF [C] . Xiong Ao, Xin Yu, Derong Liu, International Wireless Communications and Mobile Computing Conference . 2020

机译：基于TextRank和分类TF-IDF的新闻关键词提取算法
5. Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing. [D] . Csomai, Andras. 2008

机译：薄雾中的关键字：自动提取非常大的文档并在书后建立索引的关键字。
6. Feature Extraction with TF-IDF and Game-Theoretic Shadowed Sets [O] . Yan Zhang, Yue Zhou, JingTao Yao -1

机译：TF-IDF和博弈论阴影集的特征提取
7. Classification of Sindhi Headline News Documents based on TF-IDF Text Analysis Scheme [O] . Irfan Ali Kandhro, Sahar Zafar Jumani, Ajab Ali Lashari, 2019

机译：基于TF-IDF文本分析计划的Sindhi标题新闻文档分类

Keyword Extraction Based on tf/idf for Chinese News Document

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅