首页> 外文会议>Computer Science On-line Conference >The Comparison of Effects of Relevant-Feature Selection Algorithms on Certain Social-Network Text-Mining Viewpoints

【24h】

The Comparison of Effects of Relevant-Feature Selection Algorithms on Certain Social-Network Text-Mining Viewpoints

机译：相关特征选择算法对某些社交网络文本挖掘观点的影响

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This research addresses a well-known problem in the area of text mining: The high computational complexity caused by many irrelevant features (terms, words), which may play an appreciable role of noise from the classification point of view and non-linearly rule the time and memory requirements. Using a set of real-world textual documents represented by sentiment related to three selected and extensively tracked Internet sources freely written in English, a group of available algorithms (Gain Ratio, Chi Square, Info Gain, Symmetrical Uncertainty, Winnow, One R, Relief F, Principal Components, SVM, LSA) applied to discovering relevant features was tested with 10,000, 25,000, and 50,000 social-network entries. All the algorithms provided very similar results concerning looking for the relevant features - typically, only the feature significance rank was slightly different. Except for some slower algorithms, the term-preselecting time ranged from seconds to minutes to a couple of hours. However, after using only a relevant fraction of features instead of all of them, the entry length very considerably decreased by several orders of magnitude, particularly for larger data sets having very high dimensionality degree. Despite the extremely strong reduction of the number of words, the classification accuracy remained the same independently on the relevant-feature selection algorithm choice.

机译：这项研究解决了文本挖掘领域的一个众所周知的问题：许多无关的特征（术语，单词）引起的高计算复杂性，这可能会从分类的角度和非线性规则中发挥着噪声的可观作用时间和内存要求。使用与三个选择和广泛的互联网源相关的情绪为代表的一组现实世界文本，通过英语自由编写，一组可用的算法（增益比，Chi Square，Info Gain，对称不确定性，WinNow，一个R，浮雕F，应用于发现相关功能的F，主成分，SVM，LSA）以10,000,25,000和50,000个社交网络条目进行了测试。所有算法都提供了非常相似的关于寻找相关特征的结果 - 通常，只有特征意义等级略有不同。除了一些较慢的算法外，术语预选择时间范围为几分钟到几个小时。然而，在仅使用相关的特征的相关分数之后，而不是所有的特征，则进入长度的数量级非常显着降低，特别是对于具有非常高维度的较大数据集。尽管单词数量极强，但在相关特征选择算法选择上，分类准确性仍然相同。

著录项

来源
《Computer Science On-line Conference》|2017年|xv 550 p.:|共10页
会议地点
作者
Jan Zizka; Frantisek Darena;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-532;
关键词
Relevant features; Feature selection; Text mining; Computational complexity reduction; Classifier training time; Social networks;

机译：相关特征;特征选择;文本挖掘;计算复杂性减少;分类器培训时间;社交网络;

相似文献

外文文献
中文文献
专利

1. Viewpoint: Quantifying residential self-selection effects: A review of methods and findings from applications of propensity score and sample selection approaches [J] . Patricia L. Mokhtarian, David van Herick Journal of Transport and Land Use . 2016,第1期

机译：观点：量化居民的自我选择效应：倾向得分和样本选择方法的应用方法和发现的回顾
2. pubmed.mineR: An R package with text-mining algorithms to analyse PubMed abstracts [J] . Ab Rauf Shah, Jyoti Rani, Srinivasan Ramachandran1 Journal of biosciences . 2015,第4期

机译：pubmed.mineR：具有文本挖掘算法的R包，用于分析PubMed摘要
3. Combination of text-mining algorithms increases the performance [J] . Malik R, Franke L, Siebes A Bioinformatics . 2006,第17期

机译：文本挖掘算法的组合可提高性能
4. The Comparison of Effects of Relevant-Feature Selection Algorithms on Certain Social-Network Text-Mining Viewpoints [C] . Jan Zizka, Frantisek Darena Computer Science On-line Conference . 2017

机译：相关特征选择算法对某些社交网络文本挖掘观点的影响
5. On unsupervised algorithms for semantically interpretative and contextually sensitive text-mining [D] . Gopalakrishnan, Vishrawas. 2017

机译：关于语义解释和上下文敏感文本挖掘的无监督算法
6. DEVELOPMENT AND PERFORMANCE OF TEXT-MINING ALGORITHMS TO EXTRACT SOCIOECONOMIC STATUS FROM DE-IDENTIFIED ELECTRONIC HEALTH RECORDS [O] . Brittany M. Hollister, Nicole A. Restrepo, Eric Farber-Eger, -1

机译：从已识别的电子健康记录中提取社会经济状态的文本挖掘算法的开发和性能
7. A Multi-reference Viewpoint Selection Algorithm Based on Viewpoint Similarity Judgment [O] . Jian-ping Wu, Jian Zhang 2016

机译：一种基于视点相似性判断的多参考视点选择算法

The Comparison of Effects of Relevant-Feature Selection Algorithms on Certain Social-Network Text-Mining Viewpoints

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅