首页> 外文会议>Nordic conference of computational Linguistics >Comparing the Performance of Feature Representations for the Categorization of the Easy-to-Read Variety vs Standard Language

【24h】

Comparing the Performance of Feature Representations for the Categorization of the Easy-to-Read Variety vs Standard Language

机译：比较特征表示的性能，以便分类易于阅读的品种VS标准语言

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We explore the effectiveness of four feature representations — bag-of-words, word embeddings, principal components and autoencoders - for the binary categorization of the easy-to-read variety vs standard language. "Standard language" refers to the ordinary language variety used by a population as a whole or by a community, while the "easy-to-read" variety is a simpler (or a simplified) version of the standard language. We test the efficiency of these feature representations on three corpora, which differ in size, class balance, unit of analysis, language and topic. We rely on supervised and unsupervised machine learning algorithms. Results show that bag-of-words is a robust and straightforward feature representation for this task and performs well in many experimental settings. Its performance is equivalent or equal to the performance achieved with principal components and autoen-corders, whose preprocessing is however more time-consuming. Word embeddings are less accurate than the other feature representations for this classification task.

机译：我们探讨了四个特征表示的有效性 - 文字袋，Word Embeddings，主成分和AutoEncoders - 用于易于阅读的品种与标准语言的二进制分类。 “标准语言”是指整个人口或社区使用的普通语言品种，而“易于阅读”的品种是标准语言的简单（或简化）版本。我们在三个语料库上测试这些特征表示的效率，这些特征表示的大小不同，班级平衡，分析单位，语言和主题。我们依靠监督和无监督的机器学习算法。结果表明，文字袋是该任务的强大和简单的特征表示，并且在许多实验设置中表现良好。其性能等同于或等于使用主成分和自动系列的性能，其预处理变得更加耗时。 Word Embeddings比此分类任务的其他特征表示不太准确。

著录项

来源
《Nordic conference of computational Linguistics》|2019年|xx 410 p.|共10页
会议地点
作者
Marina Santini; Benjamin Danielsson; Arne J?nsson;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
入库时间 2022-08-20 20:19:25

相似文献

外文文献
中文文献
专利

1. LI-RADS treatment response categorization on gadoxetic acid-enhanced MRI: diagnostic performance compared to mRECIST and added value of ancillary features [J] . European radiology . 2020,第5期

机译：Li-Rads治疗响应分类对乙酰酸 - 增强MRI：与MRecist和附加功能的附加值相比，诊断性能
2. "My language, my mother tongue': competing language ideologies and linguistic diversity among speakers of standard and non-standard varieties [J] . Mariou Eleni International Journal of Bilingual Education and Bilingualism . 2017,第1期

机译：“我的语言，我的母语”：标准和非标准品种的演讲者之间的语言意识形态和语言多样性竞争
3. A linguist looks at AAC: Language representation systems for augmentative and alternative communication, compared with writing systems and natural language [J] . Carol Tenny Writing Systems Research . 2016,第1a2期

机译：一位语言学家研究了AAC：与书写系统和自然语言相比，用于增强和替代交流的语言表示系统
4. Comparing the Performance of Feature Representations for the Categorization of the Easy-to-Read Variety vs Standard Language [C] . Marina Santini, Benjamin Danielsson, Arne Jönsson Nordic conference of computational Linguistics . 2019

机译：比较特征表示的性能，以方便阅读的品种与标准语言进行分类
5. Automata, transformation semigroups, and languages: Standard automata for varieties of formal languages. [D] . Jang, Wuufang. 1989

机译：自动机，转换半群和语言：各种形式语言的标准自动机。
6. An Altered Circadian Clock Coupled with a Higher Photosynthesis Efficiency Could Explain the Better Agronomic Performance of a New Coffee Clone When Compared with a Standard Variety [O] . Lucile Toniutti, Jean-Christophe Breitler, Charlie Guittin, 2019

机译：与标准品种相比新的昼夜节律时钟加上更高的光合作用效率可以说明新型咖啡克隆的更好农艺性能
7. Figure 5: Performance of participants in Experiment 2: the gray columns show the performance with hand images (bars represent standard errors). (A–B) represent the SNARC-congruent conditions (A: smaller magnitudes categorized with the left hand; B: larger magnitudes categorized with the right hand); C and D represent the SNARC-incongruent conditions (C: larger magnitudes categorized with the left hand; D: smaller magnitudes categorized with the right hand). [O] . -1

机译：图5：实验中的参与者的性能2：灰色柱显示使用手图像的性能（栏代表标准错误）。（A-B）代表SNARC-一致条件（A：用左手分类的较小量大; B：用右手分类为较大的大小）; C和D代表SNARC-INCONGRUENT条件（C：用左手分类的较大幅度; D：用右手分类为较小的大小）。

Comparing the Performance of Feature Representations for the Categorization of the Easy-to-Read Variety vs Standard Language

摘要

著录项

相似文献

相关主题

期刊订阅