首页> 外文期刊>中国文献情报:英文版 >A method for improving the accuracy of automatic indexing of Chinese-English mixed documents
【24h】

A method for improving the accuracy of automatic indexing of Chinese-English mixed documents

机译:一种提高中英文混合文档自动索引准确性的方法

获取原文
获取原文并翻译 | 示例
           

摘要

Purpose:The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents.Design/methodology/approach:Based on the inherent characteristics of Chinese-English mixed texts and the cybernetics theory,we proposed an integrated control method for indexing documents.It consists of"feed-forward control","in-progress control"and"feed-back control",aiming at improving the accuracy of automatic indexing of Chinese-English mixed documents.An experiment was conducted to investigate the effect of our proposed method.Findings:This method distinguishes Chinese and English documents in grammatical structures and word formation rules.Through the implementation of this method in the three phases of automatic indexing for the Chinese-English mixed documents,the results were encouraging.The precision increased from 88.54%to 97.10%and recall improved from97.37%to 99.47%.Research limitations:The indexing method is relatively complicated and the whole indexing process requires substantial human intervention.Due to pattern matching based on a bruteforce(BF)approach,the indexing efficiency has been reduced to some extent.Practical implications:The research is of both theoretical significance and practical value in improving the accuracy of automatic indexing of multilingual documents(not confined to Chinese-English mixed documents).The proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas.Originality/value:So far,few studies have been published about the method for increasing the accuracy of multilingual automatic indexing.This study will provide insights into the automatic indexing of multilingual documents,especially Chinese-English mixed documents.
机译:目的:本文的目的是提出一种提高汉英混合文本自动索引准确性的方法。设计/方法/方法:基于汉英混合文本的内在特征和控制论,提出一种集成的文档索引控制方法,由“前馈控制”,“进行中控制”和“反馈控制”组成,旨在提高汉英混合文档自动索引的准确性。结果:该方法在语法结构和构词规则上区分了中英文文档。通过在汉英混合文档自动索引的三个阶段实施该方法,结果精度从88.54%提高到97.10%,召回率从97.37%提高到99.47%。整个索引过程需要大量的人工干预。由于基于蛮力(BF)方法的模式匹配,索引效率有所降低。多语言文档的自动索引(不限于中英文混合文档)。该方法不仅有益于生命科学文档的索引编制,而且也有益于其他学科领域的文档索引。来源/价值:到目前为止,很少有研究有关提高多语言自动索引准确性的方法的文章已经发表。本研究将为多语言文档(尤其是中英文混合文档)的自动索引提供见解。

著录项

  • 来源
    《中国文献情报:英文版》 |2012年第4期|P.77-92|共16页
  • 作者

    Yan; ZHAO; Hui; SHI;

  • 作者单位

    College of International Business,Shanghai International Studies University;

    Center for E-government Internationalization Research,Shanghai International Studies University;

    College of English Language and Literature,Shanghai International Studies University;

    Department of Foreign Languages,Taiyuan Normal University;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 CHI
  • 中图分类 检索机;
  • 关键词

    Chinese-English mixed documents String matching Ac;

    机译:中英文混合文件串匹配AC;
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号