首页> 外文会议>Proceedings of the 1990 ACM annual conference on Cooperation >Nonparametric methods for automatic classification of documents and transactions (abstract)
【24h】

Nonparametric methods for automatic classification of documents and transactions (abstract)

机译:自动分类凭证和交易的非参数方法(摘要)

获取原文
获取原文并翻译 | 示例

摘要

The question of how to classify documents is a central problem in document retrieval. The classification problem can be stated as follows. There exists a large document collection, each of which contains a set of terms. How should the documents be clustered to allow the selection of index terms so that the collection can be searched to the maximal collective benefit of the retrieval system customers? Traditionally, transaction functionalities are manually scheduled into deferred and immediate queues for processing without any special consideration given to the interwoven functionalities invoked by the different user groups. The question of how to classify transactions for the concurrency controller in a distributed system is a major problem in transaction scheduling. The problem here is, how should transaction functionalities be scheduled for processing to satisfy the requirements of the different user groups? That is, how should transaction functionalities be organized on disk to minimize diskaccess time, in the hope of fulfilling the requirements of individual user groups?

rn

This paper presents nonparametric algorithms and heuristic for automatic classification of documents according to the similarity in their keywords; the words likely to be useful as index terms for document set. The normal approximation to the binomial distribution was explored as an index for automatic classification of documents and transactions. A nonparametric measure of association consistent with the Cramer statistic was used in the examination of similarities among documents. A nonparametric analysis of variance procedure was developed for comparing the profiles of term frequencies between documents or transaction functionalities invoked between users. The usefulness of the heuristic in the automatic classification of user groups according to the transaction functionalities that they invoke in a distributed system is discussed.

机译:

如何对文档进行分类是文档检索中的核心问题。分类问题可以描述如下。存在一个大型文档集合,每个文档集合都包含一组术语。应如何将文档聚类以允许选择索引项,以便可以对馆藏进行搜索,以使检索系统客户获得最大的集体利益?传统上,将交易功能手动调度到延迟队列和即时队列中进行处理,而无需特别考虑不同用户组调用的交织功能。如何在分布式系统中为并发控制器对事务进行分类的问题是事务调度中的主要问题。这里的问题是,应该如何安排交易功能以进行处理以满足不同用户组的需求?也就是说,如何在磁盘上组织事务功能以最大程度地减少磁盘访问时间,以期满足各个用户组的需求? rn

本文提出了非参数算法和启发式算法,用于根据文档自动分类他们关键字的相似性;这些词可能会用作文档集的索引词。探索了二项分布的正态近似作为自动分类文档和交易的指标。在检查文档之间的相似性时,使用了与Cramer统计一致的非参数关联度量。开发了一种非参数方差分析程序,用于比较用户之间调用的文档或交易功能之间的词频分布。讨论了启发式方法在根据分布式系统中调用的交易功能对用户组进行自动分类中的作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号