Smaller Self-indexes for Natural Language

机译：较小的自然语言自索引

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Self-indexes for natural-language texts, where these are regarded as token (word or separator) sequences, achieve very attractive space and search time. However, they suffer from a space penalty due to their large vocabulary. In this paper we show that by replacing the Huffman encoding they implicitly use by the slightly weaker Hu-Tucker encoding, which respects the lexical order of the vocabulary, both their space and time are improved.

机译：自然语言文本的自索引（被视为标记（单词或分隔符）序列）可实现非常诱人的空间和搜索时间。但是，由于词汇量大，它们会遭受空间惩罚。在本文中，我们证明了通过用稍弱的Hu-Tucker编码替换它们隐式使用的Huffman编码，这会尊重词汇的词汇顺序，从而改善了它们的空间和时间。

著录项

来源
《International symposium on string processing and information retrieval》|2012年|372-378|共7页
会议地点
作者
Nieves R. Brisaboa; Gonzalo Navarro; Alberto Ordonez;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Word-Based Self-Indexes for Natural Language Text [J] . ANTONIO FARINA, NIEVES R. BRISABOA, GONZALO NAVARRO, ACM Transactions on Information Systems . 2012,第1期

机译：基于单词的自然语言文本自索引
2. Dependency distances in natural mixed languages Comment on "Dependency distance: A new perspective on syntactic patterns in natural languages" by Haitao Liu et al. [J] . Wang Lin Physics of life reviews . 2017,第期

机译：海地刘等人对自然混合语言的依赖距离评论“依赖距离：自然语言中的句法模式的新透视”。
3. Natural Language is a Programming Language: Applying Natural Language Processing to Software Development [J] . Michael D. Ernst LIPIcs : Leibniz International Proceedings in Informatics . 2017,第1期

机译：自然语言是一种编程语言：将自然语言处理应用于软件开发
4. Smaller Self-indexes for Natural Language [C] . Nieves R. Brisaboa, Gonzalo Navarro, Alberto Ordó?ez International Symposium on String Processing and Information Retrieval . 2012

机译：用于自然语言的较小的自我指标
5. Natural language program analysis: Combining natural language processing with program analysis to improve software maintenance tools. [D] . Shepherd, David. 2007

机译：自然语言程序分析：将自然语言处理与程序分析相结合，以改进软件维护工具。
6. Controlled Vocabularies Indexing and Medical Language Processing. Medical Language Processing: Database Capture of Natural Language Echocardiographic Reports: A Unified Medical Language System Approach [O] . K. Canfield, B. Bray, S. Huff, 1989

机译：受控词汇表索引编制和医学语言处理。医学语言处理：自然语言超声心动图报告的数据库捕获：统一医学语言系统方法
7. Smaller Self-Indexes for Natural Language ⋆ [O] . Nieves R. Brisaboa, Gonzalo Navarro, Alberto Ordóñez 2013

机译：较小的自然语言自索引⋆

Smaller Self-indexes for Natural Language

摘要

著录项

相似文献

相关主题

期刊订阅