首页> 外国专利> AUTOMATIC GENERATION OF TRAINING DATA FOR SCIENTIFIC PAPER SUMMARIZATION USING VIDEOS

AUTOMATIC GENERATION OF TRAINING DATA FOR SCIENTIFIC PAPER SUMMARIZATION USING VIDEOS

机译:使用视频自动生成科学论文摘要的培训数据

摘要

Embodiments may provide techniques to generate training data for summarization of complex documents, such as scientific papers, articles, etc., that are scalable to provide large scale training data. For example, in an embodiment, a method may be implemented in a computer system and may comprise collecting a plurality of video and audio recordings of presentations of documents, collecting a plurality of documents corresponding to the video and audio recordings, converting the plurality of video and audio recordings of presentations of documents into transcripts of the plurality of presentations, generating a summary of each document by selecting a plurality of sentences from each document using the transcript of the that document, generating a dataset comprising a plurality of the generated summaries, and training a machine learning model using the generated dataset.
机译:实施例可以提供生成用于概括复杂文档的培训数据的技术,例如科学论文,文章等,其可扩展以提供大规模训练数据。 例如,在一个实施例中,可以在计算机系统中实现方法,并且可以包括收集文档的呈现的多个视频和音频记录,收集与视频和音频记录相对应的多个文档,转换多个视频 和记录文档的介绍到多个呈现的转录物中,通过使用该文档的转录器从每个文档中选择多个句子来生成每个文档的摘要,生成包括多个生成的摘要的数据集,以及 使用生成的数据集训练机器学习模型。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号