首页> 外文会议>BICA Society., Meeting >Development of Text Data Processing Pipeline for Scientific Systems
【24h】

Development of Text Data Processing Pipeline for Scientific Systems

机译:科学系统文本数据处理管道的开发

获取原文

摘要

The aim of this work was to develop pipeline processing of scientific texts, including articles and abstracts, for their further categorization, identify patterns and build recommendations to users of scientific systems. The authors proposed a number of methods of pre-processing of texts, the method of cluster and classification analysis of texts, developed a software system of recommendations to users of scientific publications. To solve the problem of data preprocessing it is proposed to use parametrical approach to retrieve new -semantic - feature from textual publications - the type of scientific result. Scientific result type extraction is built just based on user's need for content having specific property. To solve the problem of users' profile clustering it is proposed to use ensemble method with distance metric change. For classification, ensemble method based on entropy is used. Evaluation of proposed methods and algorithms employment efficiency was carried out as applied to operation of search module of "Technologies in Education" International Congress of Conferences information system. Author acknowledges support from the MEPhl Academic Excellence Project (Contract No. 02.a03.21.0005).
机译:这项工作的目的是制定科学文本的管道处理,包括文章和摘要,以便他们的进一步分类,确定对科学系统用户的模式和建议。作者提出了许多文本预处理的方法,文本的集群和分类分析方法,为科学出版物的用户开发了一种建议的软件系统。为了解决数据预处理的问题,建议使用参数化方法来检索文本出版物的新 - 许可特征 - 科学结果的类型。科学结果类型提取基于用户需要具有特定属性的内容。为了解决用户的轮廓群集问题,建议使用具有距离度量变化的集合方法。对于分类,使用基于熵的集合方法。评估所提出的方法和算法就业效率是在会议信息系统的“教育技术中技术”的搜索模块的运作中进行的。作者认识到Mephl学术卓越项目的支持(第02.A03.21.0005号合同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号