HPS: High precision stemmer

Tomas Brychcin; Miloslav Konopik

首页> 外文期刊>Information Processing & Management >HPS: High precision stemmer

【24h】

HPS: High precision stemmer

机译：HPS：高精度声音

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Research into unsupervised ways of stemming has resulted, in the past few years, in the development of methods that are reliable and perform well. Our approach further shifts the boundaries of the state of the art by providing more accurate stemming results. The idea of the approach consists in building a stemmer in two stages. In the first stage, a stemming algorithm based upon clustering, which exploits the lexical and semantic information of words, is used to prepare large-scale training data for the second-stage algorithm. The second-stage algorithm uses a maximum entropy classifier. The stemming-specific features help the classifier decide when and how to stem a particular word. In our research, we have pursued the goal of creating a multi-purpose stemming tool. Its design opens up possibilities of solving non-traditional tasks such as approximating lemmas or improving language modeling. However, we still aim at very good results in the traditional task of information retrieval. The conducted tests reveal exceptional performance in all the above mentioned tasks. Our stemming method is compared with three state-of-the-art statistical algorithms and one rule-based algorithm. We used corpora in the Czech, Slovak, Polish, Hungarian, Spanish and English languages. In the tests, our algorithm excels in stemming previously unseen words (the words that are not present in the training set). Moreover, it was discovered that our approach demands very little text data for training when compared with competing unsupervised algorithms.

机译：在过去的几年中，对无监督的阻止方式的研究已导致开发出可靠且性能良好的方法。我们的方法通过提供更准确的词干结果，进一步改变了现有技术的范围。该方法的思想在于分两个阶段构建词干。在第一阶段，基于聚类的词干提取算法利用单词的词法和语义信息，为第二阶段算法准备大规模的训练数据。第二阶段算法使用最大熵分类器。词干特定功能可帮助分类器决定何时以及如何词干特定单词。在我们的研究中，我们追求的目标是创建一个多功能的阻止工具。它的设计为解决非传统任务（例如近似词条或改进语言建模）提供了可能性。但是，我们仍然希望在传统的信息检索任务中取得非常好的效果。进行的测试显示了上述所有任务的出色表现。我们的词干提取方法与三种最先进的统计算法和一种基于规则的算法进行了比较。我们在捷克语，斯洛伐克语，波兰语，匈牙利语，西班牙语和英语中使用了语料库。在测试中，我们的算法擅长提取以前看不见的单词（训练集中不存在的单词）。此外，发现与竞争性无监督算法相比，我们的方法只需要很少的文本数据进行训练。

著录项

来源
《Information Processing & Management》 |2015年第1期|68-91|共24页
作者
Tomas Brychcin; Miloslav Konopik;
展开▼
作者单位

Department of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia, Univerzitni 8, 306 14 Plzen, Czech Republic NTIS-New Technologies for the Information Society, Faculty of Applied Sciences, University of West Bohemia, Univerzitni 8, 306 14 Plzen, Czech Republic;

Department of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia, Univerzitni 8, 306 14 Plzen, Czech Republic NTIS-New Technologies for the Information Society, Faculty of Applied Sciences, University of West Bohemia, Univerzitni 8, 306 14 Plzen, Czech Republic;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Stemming; Morphology; Maximum entropy; Maximum mutual information; Language modeling; Information retrieval;

机译：抽干;形态学;最大熵最大程度的相互信息;语言建模;信息检索;

相似文献

外文文献
中文文献
专利

1. Film application system has high precision, throughput [J] . James Figy Control Engineering . 2020,第7期

机译：电影应用系统具有高精度，吞吐量
2. Increased throughput and cryopreservation of precision-cut lung slices extend the utility of human-relevant, 3-dimensional pulmonary test systems [J] . Gilbert B., Desai P., Amin K., Toxicology Letters: An International Journal Providing a Forum for Original and Pertinent Contributions in Toxicology Research . 2019,第期

机译：提高产量和冷冻保存的精密切割肺切片延长了人类相关，三维肺试验系统的效用
3. Bionano Genome Mapping: High-Throughput, Ultra-Long Molecule Genome Analysis System for Precision Genome Assembly and Haploid-Resolved Structural Variation Discovery [J] . Bocklandt Sven, Hastie Alex, Cao Han Advances in Experimental Medicine and Biology . 2019,第期

机译：Bionano Genome测绘：高通量，超长期分子基因组分析系统，用于精密基因组组装和单倍体分辨结构变异发现
4. The MPE XL data management system exploiting the HP precision architecture for HP's next generation commercial computer systems [C] . Kondoff, A.J. . 1988

机译：MPE XL数据管理系统将HP精密架构用于HP下一代商用计算机系统
5. Analysis and optmization of CHP, CCHP, CHP-ORC, and CCHP-ORC systems. [D] . Hueffed, Anna Kathrine. 2010

机译：CHP，CCHP，CHP-ORC和CCHP-ORC系统的分析和优化。
6. HPIPS: A High-Precision Indoor Pedestrian Positioning System Fusing WiFi-RTT MEMS and Map Information [O] . Lu Huang, Baoguo Yu, Hongsheng Li, 2020

机译：HPIP：高精度室内行人定位系统融合WiFi-RTTMEMS和地图信息
7. UTILIZAÇÃO DE SISTEMA HP-DS (HIGH PRECISION DRILL SYSTEM) PARA ALOCAÇÃO DE MALHA DE PERFURAÇÃO NAS MINAS DE FERRO DE CARAJÁS [O] . Eltton de Sousa Veras, Taníria Thais Lourenço Ferreira 2017

机译：HP-DS（高精度钻机系统）系统在Carajás铁矿中钻孔网格分配
8. High Precision Attitude Reference System (Hpars) Final Report [R] . Fouts, W. B., Hummel, S. G., Lee, J., 1968

机译：高精度姿态参考系统（Hpars）最终报告

HPS: High precision stemmer

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅