Automated Non-Content Word List Generation Using hLDA

机译：使用hLDA自动生成非内容单词列表

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present a language-independent method for the automatic, unsupervised extraction of non-content words from a corpus of documents. This method permits the creation of word lists that may be used in place of traditional function word lists in various natural language processing tasks. As an example we generated lists of words from a corpus of English, Chinese, and Russian posts extracted from Wikipedia articles and Wikipedia Wikitalk discussion pages. We applied these lists to the task of authorship attribution on this corpus to compare the effectiveness of lists of words extracted with this method to expert-created function word lists and frequent word lists (a common alternative to function word lists). hLDA lists perform comparably to frequent word lists. The trials also show that corpus-derived lists tend to perform better than more generic lists, and both sets of generated lists significantly outperformed the expert lists. Additionally, we evaluated the performance of an English expert list on machine translations of our Chinese and Russian documents, showing that our method also outperforms this alternative.

机译：在本文中，我们提出了一种独立于语言的方法，用于自动，无监督地从文档集中提取非内容词。该方法允许创建单词列表，该单词列表可以在各种自然语言处理任务中代替传统功能单词列表使用。作为示例，我们从Wikipedia文章和Wikipedia Wikitalk讨论页面中提取的英语，中文和俄语帖子的语料库生成单词列表。我们将这些列表应用于该语料库的作者归属任务，以比较用此方法提取的单词列表与专家创建的功能单词列表和常用单词列表（功能单词列表的常见替代方法）的有效性。 hLDA列表的性能与常用单词列表相当。这些试验还表明，语料库衍生的列表往往比普通列表表现更好，并且两组生成的列表都明显优于专家列表。此外，我们评估了英文专家列表在中文和俄文文档的机器翻译方面的性能，表明我们的方法也优于该方法。

著录项

来源
《Proceedings of the Twenty-Sixth international Florida Artificial Intelligence Research Society Conference》|2013年|214-219|共6页
会议地点 St. Pete Beach FL(US)
作者
Wayne Krug; Marc T. Tomlinson;
展开▼
作者单位

Language Computer Corporation Richardson, TX;

Language Computer Corporation Richardson, TX;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. IMPLEMENTATION OF IMPROVED LEVENSHTEIN ALGORITHM FOR SPELLING CORRECTION WORD CANDIDATE LIST GENERATION [J] . HANAN NAJM ABDULKHUDHUR, IMAD QASIM HABEEB, YUHANIS YUSOF, Journal of Theoretical and Applied Information Technology . 2016,第3期

机译：改进的左脑蛋白质算法在拼写正确候选单词列表中的实现
2. BRBN-T validation: adaptation of the Selective Reminding Test and Word List Generation [J] . Ana Margarida Passos, Andreia Sá, Aristides Ferreira, Arquivos de Neuro-Psiquiatria . 2015,第10期

机译：BRBN-T验证：选择性提醒测试和单词列表生成的改编
3. Patterns of word-list generation in mild cognitive impairment and Alzheimer's disease. [J] . Brandt J, Manning KJ The Clinical neuropsychologist . 2009,第5期

机译：轻度认知障碍和阿尔茨海默氏病中单词列表生成的模式。
4. Automated Non-Content Word List Generation Using hLDA [C] . Wayne Krug, Marc T. Tomlinson International Florida Aritificial Intelligence Research Society Conference . 2013

机译：使用HLDA自动非内容字列表生成
5. The Trinity is Not Just a List of Three Words: Theology, Scripture and Politics in the Patristic Twelve Prophet Commentaries [D] . Jett, Mary Julia 2017

机译：三位一体不仅是三个词的列表：爱国的十二个先知评论中的神学，圣经和政治
6. PATTERNS OF WORD-LIST GENERATION IN MILD COGNITIVE IMPAIRMENT AND ALZHEIMER’S DISEASE [O] . Jason Brandt, Kevin J. Manning -1

机译：轻度认知障碍和阿尔茨海默病的单词列表的模式
7. Automatically generation and evaluation of Stop words list for Chinese Patents [O] . Deng Na, Chen Xu 2015

机译：自动生成和评估中国专利的止损单词列表

Automated Non-Content Word List Generation Using hLDA

摘要

著录项

相似文献

相关主题

期刊订阅