首页> 外文期刊>Journal of Open Research Software >Efficient n-gram, Skipgram and Flexgram Modelling with Colibri Core
【24h】

Efficient n-gram, Skipgram and Flexgram Modelling with Colibri Core

机译:Colibri Core的高效n-gram,Skipgram和Flexgram建模

获取原文
           

摘要

Counting n-grams lies at the core of any frequentist corpus analysis and is often considered a trivial matter. Going beyond consecutive n-grams to patterns such as skipgrams and flexgrams increases the demand for efficient solutions. The need to operate on big corpus data does so even more. Lossless compression and non-trivial algorithms are needed to lower the memory demands, yet retain good speed. Colibri Core is software for the efficient computation and querying of n-grams, skipgrams and flexgrams from corpus data. The resulting pattern models can be analysed and compared in various ways. The software offers a programming library for C++ and Python, as well as command-line tools.
机译:计数n-gram是任何常用语料库分析的核心,通常被认为是微不足道的事情。超越连续的n-gram到诸如跳过图和柔性图之类的模式,增加了对有效解决方案的需求。处理大型语料库数据的需求甚至更多。需要无损压缩和非平凡的算法来降低内存需求,同时保持良好的速度。 Colibri Core是用于从语料库数据高效计算和查询n-gram,skipgram和flexgram的软件。可以以各种方式分析和比较生成的模式模型。该软件提供了用于C ++和Python的编程库以及命令行工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号