首页> 外文会议>Information Theory and Applications Workshop >Limits of Detecting Text Generated by Large-Scale Language Models
【24h】

Limits of Detecting Text Generated by Large-Scale Language Models

机译:大规模语言模型生成的文本检测的局限性

获取原文

摘要

Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular language models are bounded in terms of their perplexity, a standard measure of language generation performance. Under the assumption that human language is stationary and ergodic, the formulation is ex-tended from considering specific language models to considering maximum likelihood language models, among the class of k-order Markov approximations; error probabilities are characterized. Some discussion of incorporating semantic side information is also given.
机译:有些人考虑大规模的语言模型,可以生成长期和相干的文本,因为它们可以用于错误信息运动。在这里,我们将大规模语言模型输出检测作为假设检测问题,以将文本分类为真实或生成。我们显示特定语言模型的错误指数在困惑方面是界定的,语言生成性能的标准测量。在假设人类语言是静止和ergodic的情况下,制剂是在考虑特定语言模型的情况下,考虑到最大似然语言模型,在K订单马尔可夫近似的阶级;错误概率的特征在于。还给出了一种掺入语义信息的一些讨论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号