From Segmentation to Analyses: A Probabilistic Model for Unsupervised Morphology Induction

机译：从细分到分析：无监督形态归纳的概率模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A major motivation for unsupervised morphological analysis is to reduce the sparse data problem in under-resourced languages. Most previous work focuses on segmenting surface forms into their constituent morphs (e.g., taking: tak +ing), but surface form segmentation does not solve the sparse data problem as the analyses of take and taking are not connected to each other. We extend the MorphoChains system (Narasimhan et al., 2015) to provide morphological analyses that can abstract over spelling differences in functionally similar morphs. These analyses are not required to use all the orthographic material of a word (stopping: stop +ing), nor are they limited to only that material (acidified: acid +ify +ed). On average across six typologically varied languages our system has a similar or better F-score on EMMA (a measure of underlying morpheme accuracy) than three strong baselines; moreover, the total number of distinct morphemes identified by our system is on average 12.8% lower than for Morfessor (Virpioja et al., 2013), a state-of-the-art surface segmentation system.

机译：无监督形态分析的主要动机是减少资源不足语言中的稀疏数据问题。以前的大多数工作都集中于将表面形式分割成其组成的形态（例如，tak + ing），但是表面形式分割不能解决稀疏数据问题，因为对take和take的分析没有相互联系。我们扩展了MorphoChains系统（Narasimhan等人，2015），以提供形态分析，可以抽象出功能相似的形态中的拼写差异。这些分析不需要使用单词的所有正字法材料（停止：stop + ing），也不限于仅使用该材料（酸化：酸+ ify + ed）。平均而言，我们的系统在六种不同类型的语言上的EMMA（衡量基本语素准确性的指标）的F评分高于或高于三个强基准。此外，我们的系统识别出的独特语素的总数平均比最先进的表面分割系统Morfessor（Virpioja等人，2013年）低12.8％。

著录项

来源
《Conference of the European Chapter of the Association for Computational Linguistics》|2017年|337-346|共10页
会议地点
作者
Toms Bergmanis; Sharon Goldwater;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Unsupervised Learning of Probabilistic Object Models (POMs) for Object Classification, Segmentation, and Recognition Using Knowledge Propagation [J] . Chen Yuanhao, Zhu Long (Leo), Yuille Alan, Pattern Analysis and Machine Intelligence, IEEE Transactions on . 2009,第10期

机译：使用知识传播的对象分类，分段和识别的概率对象模型（POM）的无监督学习
2. Semantic role induction in Persian: An unsupervised approach by using probabilistic models [J] . Saeedi Parisa, Faili Heshaam, Shakery Azadeh Literary & linguistic computing . 2016,第1期

机译：波斯语中的语义角色归纳：使用概率模型的无监督方法
3. U-RSNet: An unsupervised probabilistic model for joint registration and segmentation [J] . Qiu Liang, Ren Hongliang Neurocomputing . 2021,第Auga25期

机译：U-RSNET：联合登记和分割的无监督概率模型
4. From Segmentation to Analyses: A Probabilistic Model for Unsupervised Morphology Induction [C] . Toms Bergmanis, Sharon Goldwater Conference of the European Chapter of the Association for Computational Linguistics . 2017

机译：从分割分析：无监督形态诱导的概率模型
5. Toward language-independent morphological segmentation and part-of-speech induction. [D] . Dasgupta, Sajib. 2007

机译：走向独立于语言的形态学分割和词性诱导。
6. Probabilistic Modelling for Unsupervised Analysis of Human Behaviour in Smart Cities [O] . Yazan Qarout, Yordan P. Raykov, Max A. Little 2020

机译：智慧城市中人类行为的无监督分析的概率模型
7. From Segmentation to Analyses: A Probabilistic Model for Unsupervised Morphology Induction [O] . Bergmanis, Toms, Goldwater, Sharon 2017

机译：从分割到分析：无监督形态学归纳的概率模型

From Segmentation to Analyses: A Probabilistic Model for Unsupervised Morphology Induction

摘要

著录项

相似文献

相关主题

期刊订阅