A System for Adaptive Information Extraction from Highly Informal Text

机译：来自高度非正式文本的自适应信息提取系统

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We present a first version of ADO, a system for Adaptive Data Organization, that is, information extraction from highly informal text: short text messages, classified ads, tweets, etc. It is built on a modular architecture that integrates in a transparent way off-the-shelf NLP tools, general procedures on strings and machine learning and processes tailored to a domain. The system is called adaptive because it implements a semi-supervised approach. Knowledge resources are initially built by hand, and they are updated automatically by feeds from the corpus. This allows ADO to adapt to the rapidly changing user-generated language. In order to estimate the impact of future developments, we have carried out an orientative evaluation of the system with a small corpus of classified advertisements of the real estate domain in Spanish. This evaluation shows that tokenization and chunking can be well resolved by simple techniques, but normalization, morphosyntactic and semantic tagging require either more complex techniques or a bigger training corpus.

机译：我们展示了ADO的第一个版本，一个自适应数据组织系统，即来自高度非正式文本的信息提取：短文本消息，分类广告，推文等。它是基于模块化架构，以透明的方式集成 - 货架NLP工具，串行和机器学习的一般程序和对域定制的流程。系统称为自适应，因为它实现了半监督方法。知识资源最初由手工制造，它们由来自语料库的源自动更新。这允许ADO适应快速改变的用户生成的语言。为了估计未来发展的影响，我们都进行了系统的orientative评价与西班牙房地产领域的分类广告的一个小语料库。该评估表明，通过简单的技术，可以很好地解决令牌化和块，但标准化，形态化和语义标记需要更复杂的技术或更大的训练语料库。

著录项

来源
《International Conference on Applications of Natural Language to Informations Systems》|2011年||共8页
会议地点
作者
Laura Alonso i Alemany; Rafael Carrascosa;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
User-generated text; Information extraction; Natural language processing suites;

机译：用户生成的文本;信息提取;自然语言处理套件;

相似文献

外文文献
中文文献
专利

1. Text Detection based on Adaptive Edge Detection and its Application for Transparent Text Extraction [J] . Xinhao LIU, Naoki CHIBA 電子情報通信学会技術研究報告 . 2013,第495期

机译：基于自适应边缘检测的文本检测及其在透明文本提取中的应用
2. Text Detection based on Adaptive Edge Detection and its Application for Transparent Text Extraction [J] . Xinhao LIU, Naoki CHIBA 電子情報通信学会技術研究報告. パターン認識·メディア理解. Pattern Recognition and Media Understanding . 2012,第495期

机译：基于自适应边缘检测的文本检测及其在透明文本提取中的应用
3. Sentiment analysis and spam detection in short informal text using learning classifier systems [J] . Muhammad Hassan Arif, Jianxin Li, Muhammad Iqbal, Soft computing: A fusion of foundations, methodologies and applications . 2018,第21期

机译：使用学习分类系统的短非正式文本的情感分析和垃圾邮件检测
4. A System for Adaptive Information Extraction from Highly Informal Text [C] . Laura Alonso i Alemany, Rafael Carrascosa Natural language processing and information systems . 2011

机译：一种从非正式信息中自适应提取信息的系统
5. A sensor-based adaptive control constraint system for automatic spindle speed regulation to obtain highly stable milling. [D] . Delio, Thomas Stone. 1989

机译：基于传感器的自适应控制约束系统，用于自动调节主轴转速以获得高度稳定的铣削。
6. Extractive text summarization system to aid data extraction from full text in systematic review development [O] . Duy Duc An Bui, Guilherme Del Fiol, John F. Hurdle, -1

机译：提取文本摘要系统可在系统评价开发中帮助从全文中提取数据
7. Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text [O] . van Keulen, Maurice, Habib, Mena Badieh 2014

机译：非正式文本的命名实体提取和歧义消除中的不确定性处理

A System for Adaptive Information Extraction from Highly Informal Text

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅