Automatically Constructing a Normalisation Dictionary for Microblogs

机译：自动构建微博规范化字典

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Microblog normalisation methods often utilise complex models and struggle to differentiate between correctly-spelled unknown words and lexical variants of known words. In this paper, we propose a method for constructing a dictionary of lexical variants of known words that facilitates lexical normalisation via simple string substitution (e.g. tomorrow for tmrw). We use context information to generate possible variant and normalisation pairs and then rank these by string similarity. Highly-ranked pairs are selected to populate the dictionary. We show that a dictionary-based approach achieves state-of-the-art performance for both F-score and word error rate on a standard dataset. Compared with other methods, this approach offers a fast, lightweight and easy-to-use solution, and is thus suitable for high-volume microblog pre-processing.

机译：微博常规化方法通常利用复杂的模型，并努力区分正确拼写的未知单词和已知词的词汇变种。在本文中，我们提出了一种构建知识词典的词典变体字典的方法，其通过简单的字符串替换来促进词汇标准化（例如，明天用于TMRW）。我们使用上下文信息来生成可能的变体和归一化对，然后通过字符串相似性对这些进行排名。选择高度排名对填充字典。我们表明，基于字典的方法实现了标准数据集上的F分数和字错误率的最先进的性能。与其他方法相比，这种方法提供了快速，轻巧且易于使用的解决方案，因此适用于大容量微博预处理。

著录项

来源
《Conference on empirical methods in natural language processing;Conference on computational natural language learning》|2012年|421-432|共12页
会议地点
作者
Bo Han; Paul Cook; Timothy Baldwin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Using automatic constructed thesauri instead of dictionaries in the verbal phraseological units validation task [J] . Pinto David, Priego Belem Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2020,第2Pta2期

机译：使用自动构造的叙述而不是语言介绍单位验证任务中的词典
2. An Unsupervised Microblog Emotion Dictionary Construction Method and Its Application on Sentiment Analysis [J] . Hanshi Wang, Haining Xu, Lizhen Liu, Journal of information and computational science . 2015,第7期

机译：一种无监督的微博情感词典构建方法及其在情感分析中的应用
3. The Slovenian Dictionary of Automatic Control, Systems and Robotics * * This project was supported by Automatic Control Society of Slovenia, University of Ljubljana and Jozef Stefan Institute, Slovenia. [J] . Ju? Kocijan, Gorazd Karer, Mojca ?agar Karer, IFAC PapersOnLine . 2017,第1期

机译：斯洛文尼亚自动控制，系统和机器人字典 * * 此项目得到卢布尔雅那大学斯洛文尼亚自动控制协会的支持和斯洛文尼亚的约瑟夫·斯特凡研究所。
4. Automatically Constructing a Normalisation Dictionary for Microblogs [C] . Bo Han, Paul Cook, Timothy Baldwin Conference on empirical methods in natural language processing . 2012

机译：自动构建微博的标准化词典
5. Automatic extraction of lemma-based bilingual dictionaries for morphologically rich languages [D] . Saleh, Ibrahim Mohamed Hassan 2009

机译：自动提取基于词素的双语词典，用于丰富形态的语言
6. Measurement Matrix Optimization for Compressed Sensing System with Constructed Dictionary via Takenaka–Malmquist Functions [O] . Qiangrong Xu, Zhichao Sheng, Yong Fang, 2021

机译：通过Takeaka-Malmquist函数用构造字典压缩传感系统的测量矩阵优化
7. Improving microblog retrieval from exterior corpus by automatically constructing a microblogging corpus [O] . Tu W, Mamoulis N, Cheung D 2015

机译：通过自动构建微博语料库改进外部语料库的微博检索
8. Constructing a Lexicon from a Machine Readable Dictionary [R] . McHale, M. L., Crowter, J. J. 1994

机译：从机器可读字典构造词典

Automatically Constructing a Normalisation Dictionary for Microblogs

摘要

著录项

相似文献

相关主题

期刊订阅