Splitting compounds with ngrams

机译：用ngram拆分化合物

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Compound words with unmarked word boundaries are problematic for many tasks in NLP and computational linguistics, including information extraction, machine translation, and syllabification. This paper introduces a simple, proof-of-concept language modeling approach to automatic compound segmentation, demonstrated with Finnish. The approach utilizes an off-the-shelf morphological analyzer to split training words into their constituent morphemes. A language model is subsequently trained on ngrams composed of morphemes, morpheme boundaries, and word boundaries. Finally, linguistic constraints are used to weed out phonotactically ill-formed segmentations, thereby allowing the language model to select the best grammatical segmentation. This approach achieves an accuracy of ~97%.

机译：对于NLP和计算语言学中的许多任务（包括信息提取，机器翻译和音节识别），具有未标记词边界的复合词是有问题的。本文介绍了一种简单的概念验证语言建模方法来进行自动复合细分，并用Finnish进行了演示。该方法利用现成的词法分析器将训练词分解为它们的构成语素。随后在由词素，词素边界和单词边界组成的ngram上训练语言模型。最后，使用语言约束来消除音位不整齐的分割，从而允许语言模型选择最佳的语法分割。这种方法可达到约97％的精度。

著录项

来源
《International conference on computational linguistics》|2016年|630-640|共11页
会议地点
作者
Naomi Tachikawa Shapiro;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Photocatalytic splitting of CS_2 to S_8 and a carbon-sulfur polymer catalyzed by a bimetallic ruthenium(II) compound with a tertiary amine binding site: Toward photocatalytic splitting of CO_2? [J] . Livanov K., Madhu V., Balaraman E., Inorganic Chemistry: A Research Journal that Includes Bioinorganic, Catalytic, Organometallic, Solid-State, and Synthetic Chemistry and Reaction Dynamics . 2011,第22期

机译：由具有叔胺结合位点的双金属钌（II）化合物催化的CS_2到S_8的光催化裂解和碳硫聚合物：走向CO_2的光催化裂解？
2. Black Swan Years in American English, French, German, Hebrew, and Russian: Years That Reverberate in Ngram Viewer [J] . William H. Zywiak, Ronald P. Bobroff, Gao Niu Advances in Historical Studies . 2021,第3期

机译：在美国英语，法语，德语，希伯来语和俄语的黑天鹅岁月：在Ngram观众中回荡的年份
3. Love, Hope, Perspective, and Leadership in the Ngram Database: Solace for Modern Times [J] . William H. Zywiak, Gao Niu Open Journal of Social Sciences . 2021,第11期

机译：Ngram数据库中的爱，希望，观点和领导：稳定的现代时期
4. Splitting compounds with ngrams [C] . Naomi Tachikawa Shapiro International conference on computational linguistics . 2016

机译：用ngrams分裂化合物
5. A new method to test shear wave splitting: Improving statistical assessment of splitting parameters [D] . Corbalan Castejon, Ana. 2016

机译：一种测试剪切波分裂的新方法：改进分裂参数的统计评估
6. Compounding Plasmon–Exciton Strong Coupling System with Gold Nanofilm to Boost Rabi Splitting [O] . Tingting Song, Zhanxu Chen, Wenbo Zhang, 2019

机译：将等离子-激子强耦合系统与金纳米膜复合以促进拉比分裂
7. Automatic Compound Processing: Compound Splitting and Semantic Analysis for Afrikaans and Dutch [O] . Ben Verhoeven, Walter Daelemans, Menno Van Zaanen, 2016

机译：自动复合处理：南非荷兰语和荷兰语的复合分裂和语义分析
8. Core-Level Satellites and Outer Core-Level Multiplet Splitting in Mn Model Compounds. [R] . Nelson, A. J., Reynolds, J. G., Roos, J. W. 1999

机译：mn模型化合物中的核心级卫星和外核级多重分裂。

Splitting compounds with ngrams

摘要

著录项

相似文献

相关主题

期刊订阅