Limits of Detecting Text Generated by Large-Scale Language Models

机译：大规模语言模型生成的文本检测的局限性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular language models are bounded in terms of their perplexity, a standard measure of language generation performance. Under the assumption that human language is stationary and ergodic, the formulation is ex-tended from considering specific language models to considering maximum likelihood language models, among the class of k-order Markov approximations; error probabilities are characterized. Some discussion of incorporating semantic side information is also given.

机译：有些人考虑大规模的语言模型，可以生成长期和相干的文本，因为它们可以用于错误信息运动。在这里，我们将大规模语言模型输出检测作为假设检测问题，以将文本分类为真实或生成。我们显示特定语言模型的错误指数在困惑方面是界定的，语言生成性能的标准测量。在假设人类语言是静止和ergodic的情况下，制剂是在考虑特定语言模型的情况下，考虑到最大似然语言模型，在K订单马尔可夫近似的阶级;错误概率的特征在于。还给出了一种掺入语义信息的一些讨论。

著录项

来源
《Information Theory and Applications Workshop》|2020年|1-5|共5页
会议地点
作者
Lav R. Varshney; Nitish Shirish Keskar; Richard Socher;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Modeling and optimization with Optimica and JModelica.org-Languages and tools for solving large-scale dynamic optimization problems [J] . J. Akesson, K.-E. Arzen, M. Gaefvert, Computers & Chemical Engineering . 2010,第11期

机译：使用Optimica和JModelica.org进行建模和优化-用于解决大规模动态优化问题的语言和工具
2. Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages [J] . ArnarThor Jensson, Koji Iwano, Sadaoki Furui EURASIP journal on audio, speech, and music processing . 2009,第1期

机译：使用机器翻译的文本对资源不足的语言进行语言模型自适应
3. Toward live domain-specific languages: From text differencing to adapting models at run time [J] . van Rozen Riemer, van der Storm Tijs Software and systems modeling . 2019,第1期

机译：面向实时领域特定语言：从文本差异到运行时的适应模型
4. Improvements to N-gram Language Model Using Text Generated from Neural Language Model [C] . Masayuki Suzuki, Nobuyasu Itoh, Tohru Nagano, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：使用神经语言模型生成的文本对N-gram语言模型进行的改进
5. Detecting and modeling large-scale interactions between vegetation, precipitation, and temperature over temperate-semiarid and boreal climate regimes. [D] . Wang, Weile. 2006

机译：在温带-半干旱和北方气候体制下，检测并模拟植被，降水和温度之间的大规模相互作用。
6. Large-scale replication study reveals a limit on probabilistic prediction in language comprehension [O] . Mante S Nieuwland, Stephen Politzer-Ahles, Evelien Heyselaar, 2015

机译：大规模复制研究揭示了语言理解中概率预测的局限性
7. Limits of Detecting Text Generated by Large-Scale Language Models [O] . Lav R. Varshney, Nitish Shirish Keskar, Richard Socher 2020

机译：检测大规模语言模型产生的文本的限制
8. Detecting the Difficulty Level of Foreign Language Texts [R] . Slyh, R. E., Hansen, E. G. 2010

机译：检测外语文本的难度

Limits of Detecting Text Generated by Large-Scale Language Models

摘要

著录项

相似文献

相关主题

期刊订阅