首页> 外文期刊>IEEE transactions on systems, man, and cybernetics. Part B >An automatic indexing and neural network approach to concept retrieval and classification of multilingual (Chinese-English) documents
【24h】

An automatic indexing and neural network approach to concept retrieval and classification of multilingual (Chinese-English) documents

机译:一种自动索引和神经网络的多语言(汉英)文档概念检索和分类方法

获取原文
获取原文并翻译 | 示例
       

摘要

An automatic indexing and concept classification approach to a multilingual (Chinese and English) bibliographic database is presented. We introduced a multi-linear term-phrasing technique to extract concept descriptors (terms or keywords) from a Chinese-English bibliographic database. A concept space of related descriptors was then generated using a co-occurrence analysis technique. Like a man-made thesaurus, the system-generated concept space can be used to generate additional semantically-relevant terms for search. For concept classification and clustering, a variant of a Hopfield neural network was developed to cluster similar concept descriptors and to generate a small number of concept groups to represent (summarize) the subject matter of the database. The concept space approach to information classification and retrieval has been adopted by the authors in other scientific databases and business applications, but multilingual information retrieval presents a unique challenge. This research reports our experiment on multilingual databases. Our system was initially developed in the MS-DOS environment, running ETEN Chinese operating system. For performance reasons, it was then tested on a UNIX-based system. Due to the unique ideographic nature of the Chinese language, a Chinese term-phrase indexing paradigm considering the ideographic characteristics of Chinese was developed as a multilingual information classification model. By applying the neural network based concept classification technique, the model presents a novel way of organizing unstructured multilingual information.
机译:提出了一种多语言(中文和英文)书目数据库的自动索引和概念分类方法。我们引入了一种多线性术语表述技术,以从汉英书目数据库中提取概念描述符(术语或关键字)。然后,使用共现分析技术生成相关描述符的概念空间。像人造词库一样,系统生成的概念空间可用于生成其他与语义相关的搜索词。对于概念分类和聚类,开发了一种Hopfield神经网络的变体来聚类相似的概念描述符,并生成少量概念组以表示(概括)数据库的主题。作者在其他科学数据库和商业应用程序中采用了概念空间方法进行信息分类和检索,但是多语言信息检索提出了独特的挑战。这项研究报告了我们在多语言数据库上的实验。我们的系统最初是在运行ETEN中文操作系统的MS-DOS环境中开发的。由于性能原因,然后在基于UNIX的系统上对其进行了测试。由于汉语具有独特的表意性质,因此开发了考虑汉语表意特征的汉语术语表述范式作为多语言信息分类模型。通过应用基于神经网络的概念分类技术,该模型提出了一种组织非结构化多语言信息的新颖方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号