首页> 中文期刊> 《计算机工程与设计》 >新闻出版行业标准碎片化标引的研究与实现

新闻出版行业标准碎片化标引的研究与实现

         

摘要

当前新闻出版行业对标准文献的标引工作都是采集标准的结构化信息进行标引,不对标准的具体内容部分标引,导致用户查询时无法快速定位到标准内容,为此提出针对标准的“碎片化”标引方案.根据标准的特点存储标准的结构及内容,建立新闻出版行业词库,以此为基础进行分词,采用统计加权算法,考虑词频、词性、词长、位置加权因子进行自动标引.实验结果表明,该方案实现了新闻出版行业标准的碎片化标引,提高了检索的效率和质量.%Standards' indexing job for press and publication mostly collects structured information,and specific content is not ineluded,resulting in that users are unable to quickly locate standards' content when querying.To solve the problems,a scheme of fragmented indexing for press and publication standards was proposed.The structure and content of the standard were stored according to its characteristics.The thesaurus of press and publication standard was established,and word segmentation was done based on it to improve word accuracy.Considering the weighting factors of word frequency,part of speech,word length and location,statistical weighting algorithm was adopted for automatic indexing.Results of experiments show that using the scheme realizes the fragmented indexing of press and publication standards and greatly improves the efficiency and quality of retrieval.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号