An Estimation Method of the Words Tendency Based on Time-Series Variation

机译：基于时间序列变化的单词倾向性估计方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, there are many electronic text and computers are more and more processing for them. Frequencies of the words in the texts change according to the time-series variation. Frequently, these words are considered as keywords because they have strong relationships with the subject of the texts. However, traditional document processing systems do not consider the time-series information of the words, during calculating their importance. This paper presents an estimation method of the word trend considering the time-series variation. First, we made an example, which show us the re-arrangement of the similar texts retrieval system by using traditional methods and after using the method of the word trend based on time-series variation. By using the decision tree, the proposed method of this paper classifies words into three classes: increasing, constant, and decreasing, which effect in the stability class of words. This classification is acquired by learning five attribute values of the words, such as: slope and slice of regression line, correlation coefficient, the angle between two regression straight lines, and some special nouns attributes, and then we estimate the class of the new words. These attribute values are defined in order to measure the frequency change of each word quantitatively, and we find that these attributes have efficiency on the behavior of recall and precision. Among the evaluation, we obtained the attribute values of 1,069 proper nouns extracted from 8,216 articles of CNN newspapers (1997-1999) "This data called Learning-Data", where these articles discuss about the professional baseball. By learning the attribute values to the decision tree, 696 proper nouns that extracted from 1,272 articles of CNN newspaper (2000) are classified "This data called Test-Data". According to comparing the decision tree results evaluation with human evaluation results, it is estimated that, F-measures of increasing class, constant-class, and decrease-class are 0.847,0.851, and 0.768 respectively.

机译：近来，有许多电子文本，计算机对它们的处理也越来越多。文本中单词的频率根据时间序列变化而变化。通常，这些词被视为关键字，因为它们与文本的主题有很强的关系。但是，传统的文档处理系统在计算单词的重要性时不会考虑单词的时间序列信息。本文提出了一种考虑时间序列变化的单词趋势估计方法。首先，我们举一个例子，向我们展示了使用传统方法对类似文本检索系统的重新排列，以及使用了基于时间序列变化的词趋向方法之后的例子。通过使用决策树，本文提出的方法将单词分为增加，恒定和减少三类，这会影响单词的稳定性。通过学习单词的五个属性值（例如：回归线的斜率和切片，相关系数，两条回归直线之间的角度以及一些特殊名词属性）来获得此分类，然后估计新单词的类别。定义这些属性值是为了定量地测量每个单词的频率变化，并且我们发现这些属性在召回和精确度方面具有效率。在评估中，我们获得了从CNN报纸的8,216篇文章（1997-1999年）“此数据称为学习数据”中提取的1,069个专有名词的属性值，其中这些文章讨论了职业棒球。通过学习决策树的属性值，将从CNN报纸（2000）的1,272篇文章中提取的696个专有名词分类为“此数据称为Test-Data”。通过将决策树结果评估与人类评估结果进行比较，估计增加等级，恒定等级和减少等级的F度量分别为0.847、0.851和0.768。

著录项

来源
《International NAISO(Natural amp; Artificial Intelligence Systems Organization) Congress on Information Science Innovations Mar 17-21, 2001 Dubai, U.A.E.》|2001年|p.668-674|共7页
会议地点 Dubai(AE);Dubai(AE)
作者
El-Sayed Atlam; Makoto Okada; Masami Shishibori; Jun-ichi Aoe;
展开▼
作者单位

Dept. of Information Science and Intelligent Systems University of Tokushima Tokushima,770-8506, Japan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
time-series variation; words trend; decision tree; CNN newspaper;

机译：时间序列变化;词趋向;决策树; CNN报纸;

相似文献

外文文献
中文文献
专利

1. An Analytic Model to Represent Relation between Finish Date of Job-Hunting and Time-Series Variation of Entry Tendencies [J] . Seiya Nagamori, Kenta Mikawa, Masayuki Goto, Industrial Engineering & Management Systems . 2019,第3期

机译：代表职业狩猎与进入趋势的时间序列变化的关系的分析模型
2. An Analysis of the Emotional Tendency of New Words in Chinese Text Based on Word2Vec [J] . Jiang Quan, Rao Wenbi Computer Science & Information Technology . 2020,第4期

机译：基于Word2VEC的中文文本中新词的情感趋势分析
3. A Province-Scale Maize Yield Estimation Method Based on TM and Modis Time-Series interpolation [J] . Xin He, Yuanshu Jing, Xiaohe Gu, Sensor Letters: A Journal Dedicated to all Aspects of Sensors in Science, Engineering, and Medicine . 2010,第1期

机译：基于TM和Modis时间序列插值的省级玉米单产估算方法
4. An estimation method of the words tendency based on time-series variation [C] . El-Sayed Atlam, Makoto Okada, Masami Shishibori, International NAISO Congress on Information Science Innovations . 2001

机译：基于时间序列变化的词语倾向的估计方法
5. A study of variational phase estimation methods for synthetic aperture radar applications [D] . Sartor, Kenneth James Wesley 2007

机译：合成孔径雷达应用中的变相估计方法研究
6. A method for the estimation of functional brain connectivity from time-series data [O] . A. Wilmer, M. H. E. de Lussanet, M. Lappe 2010

机译：一种根据时间序列数据估算功能性大脑连通性的方法
7. PARAMETER ESTIMATION FROM TIME-SERIES DATA WITH CORRELATED ERRORS: A WAVELET-BASED METHOD AND ITS APPLICATION TO TRANSIT LIGHT CURVES [O] . Joshua A. Carter, Joshua N. Winn 2009

机译：具有相关误差的时间序列数据的参数估计：基于小波的方法及其在传输光线曲线的应用

An Estimation Method of the Words Tendency Based on Time-Series Variation

摘要

著录项

相似文献

相关主题

期刊订阅