首页> 外文会议>Document Recognition and Retrieval XIII; Electronic Imaging Science and Technology >Document Clustering: Applications in a Collaborative Digital Library
【24h】

Document Clustering: Applications in a Collaborative Digital Library

机译:文件丛集:协同数位图书馆中的应用程式

获取原文
获取原文并翻译 | 示例

摘要

This paper introduces a document clustering method within a commercial document repository, FileShare®. FileShare® is a commercial collaborative digital library offering facilities for sharing and accessing documents over a simple Internet browser (e.g. Microsoft® Internet Explorer®, Netscape® or Opera®) within groups of people working on common projects. As the number of documents increases within a digital library, displaying these documents in this environment poses a huge challenge. This paper proposes a document clustering method that uses a modified version of the traditional K-Means algorithm to categorize documents by their themes using lexical chaining within the FileShare® repository. The proposed algorithm is unsupervised, and has shown very high accuracy in a typical experimental setup.
机译:本文介绍了商业文档存储库FileShare®中的文档聚类方法。 FileShare®是一种商业协作型数字图书馆,提供了用于在一组从事共同项目的人员中通过简单的Internet浏览器(例如Microsoft®InternetExplorer®,Netscape®或Opera®)共享和访问文档的功能。随着数字图书馆中文档数量的增加,在这种环境下显示这些文档构成了巨大的挑战。本文提出了一种文档聚类方法,该方法使用传统K-Means算法的修改版本通过FileShare®存储库中的词法链接按文档主题对文档进行分类。所提出的算法是无监督的,并且在典型的实验设置中已显示出很高的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号