首页> 外国专利> METHOD AND APPARATUS FOR EXTRACTING TOPIC SENTENCES OF WEBPAGES

METHOD AND APPARATUS FOR EXTRACTING TOPIC SENTENCES OF WEBPAGES

机译:提取网页主题句的方法和装置

摘要

In various embodiments, a method and an apparatus for extracting topic sentences of webpages are provided. The method comprises: obtaining candidate webpages, and a pre-built machine learning model, each candidate webpage contains multiple preselected candidate topic sentences, and each candidate topic sentence includes several word segments; determining word feature values that indicate importance levels of the word segments in each candidate webpage respectively, and inputting the word feature values to the machine learning model to obtain an importance value for each word segment; for each candidate webpage, determining a partial order value for each candidate topic sentence according to the importance values of the word segments included in the candidate topic sentence; and for each candidate webpage, selecting one of the plurality of candidate topic sentences that is associated with a partial order value larger than a preset threshold value as a target topic sentence of the candidate webpage.
机译:在各个实施例中,提供了一种用于提取网页的主题句子的方法和设备。该方法包括:获取候选网页,以及预先建立的机器学习模型,每个候选网页包含多个预选的候选主题句子,每个候选主题句子包括多个词段;确定分别指示每个候选网页中的词段的重要性级别的词特征值,并将该词特征值输入到机器学习模型中以获得每个词段的重要性值;对于每个候选网页,根据候选话题句子中包含的词段的重要度,确定每个候选话题句子的偏序值;对于每个候选网页,选择与大于预定阈值的偏序值相关联的多个候选主题语句中的一个作为候选网页的目标主题语句。

著录项

  • 公开/公告号IN201614039494A

    专利类型

  • 公开/公告日2017-06-16

    原文格式PDF

  • 申请/专利权人

    申请/专利号IN201614039494

  • 发明设计人 LI CHENYAO;ZENG HONGLEI;

    申请日2016-11-18

  • 分类号G06F15/16;

  • 国家 IN

  • 入库时间 2022-08-21 13:38:39

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号