首页> 外文会议>International Conference on Emerging Trends in Engineering and Technology - Signal and Information Processing >Hindi Multi-document Word Cloud based Summarization through Unsupervised Learning
【24h】

Hindi Multi-document Word Cloud based Summarization through Unsupervised Learning

机译:印地文多文件词基于无监督学习的总结

获取原文

摘要

Managing documents is a critical and significant task and supports many applications ranging from information retrieval to clustering search engine results. The multilinguistic facility provided by websites makes Hindi as a major language in the digital domain of information technology today. This work focuses on document management and summarization of Hindi corpus. The objective is to manage the documents and summarize Hindi corpus by applying extracting tokens and document clustering. The work is better in terms of scalability and supports consistent quality of cluster for incremental data set. Most of the past and contemporary research works have targeted English corpus document management. Hindi corpus has been mostly exploited by the researchers for exploring stemming, single- document summarization and classifier design on Hindi corpus. Implementing unsupervised learning on Hindi corpus for summarization of multiple documents through Word Cloud is still an untouched area. Technically speaking, the current work is an application of TF-IDF, cosine-based document similarity measures and cluster dendrograms, in addition to various other Natural Language Processing (NLP) activities. Entropy and precision are used to evaluate the experiments carried on different live and available/tested datasets and results
机译:管理文档是一个重要的和重要的任务,支持许多应用程序,从信息检索到聚类搜索引擎结果。网站提供的多洋州型工厂使印地语成为当今信息技术数字领域的主要语言。这项工作侧重于文档管理和印地语语料库的总结。目标是通过应用提取令牌和文档聚类来管理文档并汇总印地语语料库。在可伸缩性方面,该工作更好,并支持增量数据集的一致群集质量。过去的大部分和当代研究工程都有针对英语语料库文件管理。印地语语料库主要由研究人员利用,用于探索印地语语料库上的源头,单一文件摘要和分类器设计。通过Word云汇总多个文档的总结综述,实施无监督的学习仍然是一个未触及的区域。在技​​术上,除了各种其他自然语言处理(NLP)活动之外,目前的工作是TF-IDF,基于余弦的文档相似度测量和群集树形图的应用。熵和精度用于评估不同现场和可用/测试数据集和结果的实验

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号