首页> 外文会议>International conference on web information systems and technologies >The GENIE System: Classifying Documents by Combining Mixed-Techniques
【24h】

The GENIE System: Classifying Documents by Combining Mixed-Techniques

机译:GENIE系统:通过混合技术对文档进行分类

获取原文

摘要

Today, the automatic text classification is still an open problem and its implementation in companies and organizations with large volumes of data in text format is not a trivial matter. To achieve optimum results many parameters come into play, such as the language, the context, the level of knowledge of the issues discussed, the format of the documents, or the type of language that has been used in the documents to be classified. In this paper we describe a multi-language rule-based pipeline system, called GENIE, used for automatic document categorisation. We have used several business corpora in order to test the real capabilities of our proposal, and we have studied the results of applying different stages of the pipeline over the same data to test the influence of each step in the categorization process. The results obtained by this system are very promising, and in fact, the GENIE system is already being used on real production environments with very good results.
机译:如今,自动文本分类仍然是一个未解决的问题,在具有大量文本格式数据的公司和组织中实现自动分类已不是一件容易的事。为了获得最佳结果,许多参数都起作用,例如语言,上下文,所讨论问题的知识水平,文档格式或要在分类文档中使用的语言类型。在本文中,我们描述了一种称为GENIE的基于多语言规则的管道系统,用于自动文档分类。为了测试提案的实际功能,我们使用了多个业务语料库,并且研究了在同一数据上应用流水线的不同阶段以测试分类过程中每个步骤的影响的结果。通过该系统获得的结果非常有前景,实际上,GENIE系统已经在实际生产环境中使用,并且效果非常好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号