【24h】

A Study of English Neologisms Through Large-Scale Probabilistic Indexing of Bentham's Manuscripts

机译:通过边沁手稿的大规模概率索引研究英语新词

获取原文

摘要

Probabilistic indexes (PI) are obtained from untranscribed handwritten text images by means of recently introduced lexicon-free, query-by-string, probabilistic keyword spotting techniques. Pis have proven to be a powerful tool that allow efficient, free textual searching in very large collections of handwritten historical documents. Pis convey uncertain information about the textual contents of the document images. However, text uncertainty is accurately modeled by the associated lexical probability distributions, which can be conveniently exploited in many applications. As an example of these applications, here we study the dating of a number of English neologisms in the large collection of Bentham's manuscripts, which encompass 90 000 images. The statistical techniques used for neologism dating are theoretically motivated and experiments on this collection are reported. Among other interesting contributions of this study, it provides sound evidence that some commonly assumed neologism introduction dates need to be revised.
机译:借助最近引入的无词典,按字符串查询,概率关键字发现技术,从未转录的手写文本图像中获得概率索引(PI)。事实证明,Pis是一个强大的工具,可以对大量的手写历史文档进行高效,免费的文本搜索。 Pis传递有关文档图像文本内容的不确定信息。但是,文本不确定性可以通过关联的词法概率分布精确建模,可以在许多应用程序中方便地利用。作为这些应用程序的一个示例,在这里,我们在大量的边沁手稿中研究了许多英语新词的年代,其中包括9万张图像。从理论上讲,用于新词约会的统计技术是有动机的,并据此进行了实验。在这项研究的其他有趣贡献中,它提供了有力的证据,表明一些通常假定的新词引入日期需要修改。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号