首页> 外文OA文献 >Data filtering in humor generation: comparative analysis of hit rate and co-occurrence rankings as a method to choose usable pun candidates
【2h】

Data filtering in humor generation: comparative analysis of hit rate and co-occurrence rankings as a method to choose usable pun candidates

机译:幽默产生中的数据过滤:命中率和同现等级的比较分析,作为选择双关语候选者的一种方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper we propose a method of filtering excessive amount of textual data acquired from the Internet. In our research on pun generation in Japanese we experienced problems with extensively long data processing time, caused by the amount of phonetic candidates generated (i.e. phrases that can be used to generate actual puns) by our system. Simple, naive approach in which we take into considerations only phrases with the highest occurrence in the Internet, can effect in deletion of those candidates that are actually usable. Thus, we propose a data filtering method in which we compare two Internet-based rankings: a co-occurrence ranking and a hit rate ranking, and select only candidates which occupy the same or similar positions in these rankings. In this work we analyze the effects of such data reduction, considering 1 cases: when the candidates are on exactly the same positions in both rankings, and when their positions differ by 1, 2, 3 and 4. The analysis is conducted on data acquired by comparing pun candidates generated by the system (and filtered with our method) with phrases that were actually used in puns created by humans. The results show that the proposed method can be used to filter excessive amounts of textual data acquired from the Internet.
机译:在本文中,我们提出了一种过滤从互联网获取的大量文本数据的方法。在我们对日语中双关语的生成进行的研究中,我们遇到了由系统产生的语音候选量(即可以用于生成实际双关语的短语)导致的数据处理时间过长的问题。简单,幼稚的方法,其中仅考虑互联网上出现率最高的短语,可以删除那些实际可用的候选词。因此,我们提出了一种数据过滤方法,其中我们比较两个基于Internet的排名:同现排名和命中率排名,并仅选择在这些排名中占据相同或相似位置的候选者。在这项工作中,我们考虑了以下1种情况:当候选人在两个排名中都处于完全相同的位置,并且其位置相差1、2、3和4时,分析了这种数据缩减的效果。通过将系统生成的双关语候选词(并用我们的方法过滤)与人类创建的双关语中实际使用的短语进行比较。结果表明,该方法可用于过滤从互联网获取的大量文本数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号