首页> 外国专利> IDLE TALK EXTRACTION SYSTEM, METHOD AND PROGRAM FOR EXTRACTING IDLE TALK PARTS FROM CONVERSATION

IDLE TALK EXTRACTION SYSTEM, METHOD AND PROGRAM FOR EXTRACTING IDLE TALK PARTS FROM CONVERSATION

机译:从对话中提取空闲对话部分的空闲对话提取系统,方法和程序

摘要

PROBLEM TO BE SOLVED: To provide a technique for extracting idle talk parts from a conversation.SOLUTION: An idle talk extraction system for extracting idle talks from a conversation comprises: a first corpus including documents in a plurality of fields; a second corpus including only documents in a field to which the conversation belongs; a determination part to determine as a lower limit subject word a word for which an idf value for the first corpus and an idf value for the second corpus are each below a first prescribed threshold value, for words included in the second corpus; a score calculation part to calculate as a score a tf-idf value for each word included in the second corpus and, for the lower limit subject word, use a constant set as a lower limit instead of the tf-idf value; a clipping part to sequentially cut out intervals to be processed, from text data of contents of the conversation; and an extraction part to extract as an idle talk part an interval where an average value of the score of words included in the interval is larger than a second prescribed threshold value.
机译:解决的问题:提供一种从对话中提取闲聊部分的技术。第二语料库,仅包含对话所属字段中的文档;确定部分,将包括在第二语料库中的单词的第一语料的idf值和第二语料库的idf值均低于第一规定阈值的单词确定为下限主题词;得分计算部分,用于计算第二语料库中包括的每个单词的tf-idf值作为得分,并且对于下限主题词,使用设置为下限的常数代替tf-idf值;剪切部分,用于从会话内容的文本数据中顺序剪切待处理的间隔;提取部提取包含在该间隔中的单词分数的平均值大于第二规定阈值的间隔作为空闲讲话部。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号