Punjabi Text Clustering by Sentence Structure Analysis

Saurabh Sharma; Vishal Gupta

首页> 外文期刊>Computer Science & Information Technology >Punjabi Text Clustering by Sentence Structure Analysis

【24h】

Punjabi Text Clustering by Sentence Structure Analysis

机译：基于句子结构分析的旁遮普语文本聚类

获取原文

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Punjabi Text Document Clustering is done by analyzing the sentence structure of similar documents sharing same topics and grouping them into clusters. The prevalent algorithms in this field utilize the vector space model which treats the documents as a bag of words. The meaning in natural language inherently depends on the word sequences which are overlooked and ignored while clustering. The current paper deals with a new Punjabi text clustering algorithm named Clustering by Sentence Structure Analysis(CSSA) which has been carried out on 221 Punjabi news articles available on news sites. The phrases are extracted for processing by a meticulous analysis of the structure of a sentence by applying the basic grammatical rules of Karaka. Sequences formed from phrases, are used to find the topic and for finding similarities among all documents which results in the formation of meaningful clusters.

机译：旁遮普文本文档聚类是通过分析共享相同主题的相似文档的句子结构并将它们分组组成的。该领域中流行的算法利用矢量空间模型，该矢量空间模型将文档视为一袋单词。自然语言的含义固有地取决于在聚类时被忽略和忽略的单词序列。本文研究了一种新的旁遮普文本聚类算法，称为“通过句子结构分析进行聚类”（CSSA），该算法已在新闻站点上的221篇旁遮普新闻中进行了研究。通过应用Karaka的基本语法规则，通过对句子结构的仔细分析来提取短语以进行处理。由短语形成的序列用于查找主题并在所有文档中查找相似之处，从而形成有意义的簇。

著录项

来源
《Computer Science & Information Technology》 |2012年第4期|共8页
作者
Saurabh Sharma; Vishal Gupta;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Text Summarization Based on Sentence Clustering with Rhetorical Structure Information [J] . SA-KWANG SONG, DONG HYUN JANG, SUNG HYON MYAENG International Journal of Computer Processing of Oriental Languages . 2005,第2期

机译：带有修辞结构信息的句子聚类文本摘要。
2. Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm [J] . Skabar Andrew, Abdalgader Khaled Knowledge and Data Engineering, IEEE Transactions on . 2013,第1期

机译：使用新型模糊关系聚类算法的句子级文本聚类
3. Effect of Statistical POS Tagger on Syntactic Analysis of Punjabi Sentences [J] . Sanjeev Kumar Sharma Indian Journal of Science and Technology . 2016,第32期

机译：统计POS标注对旁遮普句句法分析的影响。
4. Domain Based Punjabi Text Document Clustering [C] . Saurabh Sharma, Vishal Gupta International conference on computational linguistics . 2012

机译：基于域的旁遮普文本文档聚类
5. Text association mining with cross-sentence inference, structure-based document model and multi-relational text mining. [D] . Thaicharoen, Supphachai. 2009

机译：带有跨句推理的文本关联挖掘，基于结构的文档模型和多关系文本挖掘。
6. Special Section: Current Status and Future Directions of the Analysis of Verbal Behavior: Sentence and sentence structure in the analysis of verbal behavior [O] . Ullin T. Place 1998

机译：特殊部分：言语行为分析的现状和未来方向：言语行为分析中的句子和句子结构
7. PUNJABI TEXT CLUSTERING BY SENTENCE STRUCTURE ANALYSIS [O] . Saurabh Sharma, Vishal Gupta 2013

机译：基于句子结构分析的旁遮普语文本聚类
8. A Sentence-to-Sentence Clustering Procedure for Pattern Analysis. [R] . Lu, S. Y., Fu, K. S. 1977

机译：句法分析的句子到句子聚类程序。

Punjabi Text Clustering by Sentence Structure Analysis

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅