Research on Data Cleaning in Text Clustering

机译：文本聚类数据清理研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The more reasonable method of data cleaning has been proposed according to situation that data cleaning mistake away words which have distinguish capacity in text clustering pre-treatment presently. This method considers the situation of new field words happening. For the problem of rare word filtering, consider both the importance degree of the word in the whole text collection, namely word frequency, and the importance in the text in which it appears, namely weightings. So this method avoids dividing it into existed category in order to achieve the goal of filtering comparatively accurately which make result of text clustering more precise. Text clustering is made by means of C-means algorithm at last and verifying this method improves the accuracy of text clustering result.

机译：根据数据清洁错误的情况，提出了更合理的数据清洁方法，这些情况在目前在文本聚类预处理中具有区分容量的单词。这种方法考虑了发生新的现场单词的情况。对于稀有单词过滤问题，考虑整个文本集合中的单词的重要程度，即字频率，以及它出现的文本中的重要性，即加权。因此，此方法避免将其划分为存在的类别，以便达到相对准确地过滤的目标，这使得文本聚类更精确。文本群集是通过最后的C均值算法进行的，并验证此方法提高文本群集结果的准确性。

著录项

来源
《International Forum on Information Technology and Applications》|2010年||共3页
会议地点
作者
Yuhang Zhang; Yue Wang; Wei Yang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 G202-53;
关键词
data cleaning; text clustering; weighting; word frequency;

机译：数据清洁;文本聚类;加权;字频率;

相似文献

外文文献
中文文献
专利

1. Application of Efficient Data Cleaning Using Text Clustering for Semistructured Medical Reports to Large-Scale Stool Examination Reports: Methodology Study [J] . Hyunki Woo, Kyunga Kim, KyeongMin Cha, Journal of medical Internet research . 2019,第1期

机译：使用文本聚类的高效数据清理将半结构化医疗报告应用于大规模粪便检查报告：方法学研究
2. Clustering high dimensional data: Examining differences and commonalities between subspace clustering and text clustering - A position paper [J] . Hans-Peter Kriegel, Eirini Ntoutsi SIGKDD explorations . 2013,第2期

机译：聚类高维数据：研究子空间聚类和文本聚类之间的差异和共性-立场文件
3. Mining Text Data using different Text Clustering Techniques [J] . Ratna S. Patil, Prof. B. S. Chordia International Journal of Computer Trends and Technology . 2017,第2期

机译：使用不同的文本聚类技术挖掘文本数据
4. Research on Data Cleaning in Text Clustering [C] . Yuhang Zhang, Yue Wang, Wei Yang 2010 International Forum on Information Technology and Applications . 2010

机译：文本聚类中数据清理的研究
5. Scaling the Technology Opportunity Analysis text data mining methodology: Data extraction, cleaning, online analytical processing analysis, and reporting of large multi-source datasets. [D] . George, Richard Peyton. 2006

机译：扩展技术机会分析文本数据挖掘方法：数据提取，清理，在线分析处理分析以及大型多源数据集的报告。
6. Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata [O] . Wei Hu, Amrapali Zaveri, Honglei Qiu, 2017

机译：集群清洗：解决生物医学元数据中数据质量问题的方法
7. Object Oriented Intelligent Multi-Agent System Data Cleaning Architecture to clean Preference based Text Data [O] . Dr. G. Arumugam, T. Joshva Devadas, Madurai Kamaraj 2011

机译：面向对象的智能多代理系统数据清理架构，用于清理基于首选项的文本数据

Research on Data Cleaning in Text Clustering

摘要

著录项

相似文献

相关主题

期刊订阅