Automated Non-Content Word List Generation Using hLDA

机译：使用HLDA自动非内容字列表生成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present a language-independent method for the automatic, unsupervised extraction of non-content words from a corpus of documents. This method permits the creation of word lists that may be used in place of traditional function word lists in various natural language processing tasks. As an example we generated lists of words from a corpus of English, Chinese, and Russian posts extracted from Wikipedia articles and Wikipedia Wikitalk discussion pages. We applied these lists to the task of authorship attribution on this corpus to compare the effectiveness of lists of words extracted with this method to expert-created function word lists and frequent word lists (a common alternative to function word lists). hLDA lists perform comparably to frequent word lists. The trials also show that corpus-derived lists tend to perform better than more generic lists, and both sets of generated lists significantly outperformed the expert lists. Additionally, we evaluated the performance of an English expert list on machine translations of our Chinese and Russian documents, showing that our method also outperforms this alternative.

机译：在本文中，我们提出了一种独立于自动，无人监督的非内容词从文档语料库中提取的语言的方法。该方法允许在各种自然语言处理任务中创建可以用于代替传统功能字列表的单词列表。作为一个示例，我们从维基百科文章和维基百科维基展会讨论页面中提取的英语，中文和俄语帖子中的文字列表。我们将这些列表应用于此语料库上的作者归属的任务，以比较用该方法提取的单词列表的有效性，以专业创建的函数字列表和频繁的单词列表（一个功能字列表的常见替代品）。 HLDA列表与频繁的单词列表相当执行。该试验还表明，语料库派生的列表倾向于比更多的通用列表更好，并且两组生成的列表都显着优于专家列表。此外，我们还评估了英语专家列表的表现在中国和俄罗斯文档的机器翻译中，表明我们的方法也优于这种替代方案。

著录项

来源
《International Florida Aritificial Intelligence Research Society Conference》|2013年||共6页
会议地点
作者
Wayne Krug; Marc T. Tomlinson;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. IMPLEMENTATION OF IMPROVED LEVENSHTEIN ALGORITHM FOR SPELLING CORRECTION WORD CANDIDATE LIST GENERATION [J] . HANAN NAJM ABDULKHUDHUR, IMAD QASIM HABEEB, YUHANIS YUSOF, Journal of Theoretical and Applied Information Technology . 2016,第3期

机译：改进的左脑蛋白质算法在拼写正确候选单词列表中的实现
2. BRBN-T validation: adaptation of the Selective Reminding Test and Word List Generation [J] . Ana Margarida Passos, Andreia Sá, Aristides Ferreira, Arquivos de Neuro-Psiquiatria . 2015,第10期

机译：BRBN-T验证：选择性提醒测试和单词列表生成的改编
3. Patterns of word-list generation in mild cognitive impairment and Alzheimer's disease. [J] . Brandt J, Manning KJ The Clinical neuropsychologist . 2009,第5期

机译：轻度认知障碍和阿尔茨海默氏病中单词列表生成的模式。
4. Automated Non-Content Word List Generation Using hLDA [C] . Wayne Krug, Marc T. Tomlinson Proceedings of the Twenty-Sixth international Florida Artificial Intelligence Research Society Conference . 2013

机译：使用hLDA自动生成非内容单词列表
5. The Trinity is Not Just a List of Three Words: Theology, Scripture and Politics in the Patristic Twelve Prophet Commentaries [D] . Jett, Mary Julia 2017

机译：三位一体不仅是三个词的列表：爱国的十二个先知评论中的神学，圣经和政治
6. PATTERNS OF WORD-LIST GENERATION IN MILD COGNITIVE IMPAIRMENT AND ALZHEIMER’S DISEASE [O] . Jason Brandt, Kevin J. Manning -1

机译：轻度认知障碍和阿尔茨海默病的单词列表的模式
7. Automatically generation and evaluation of Stop words list for Chinese Patents [O] . Deng Na, Chen Xu 2015

机译：自动生成和评估中国专利的止损单词列表

Automated Non-Content Word List Generation Using hLDA

摘要

著录项

相似文献

相关主题

期刊订阅