Entity Extraction for Malayalam Social Media Text Using Structured Skip-gram Based Embedding Features from Unlabeled Data

G. Remmiya Devi; P.V. Veena; M. Anand Kumar; K.P. Soman

首页> 外文期刊>Procedia Computer Science >Entity Extraction for Malayalam Social Media Text Using Structured Skip-gram Based Embedding Features from Unlabeled Data

【24h】

Entity Extraction for Malayalam Social Media Text Using Structured Skip-gram Based Embedding Features from Unlabeled Data

机译：使用基于结构化跳过图的嵌入特征从未标记数据中提取马拉雅拉姆语社交媒体文本的实体

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Social media text is generally informal and noisy but sometimes tends to have informative content. Extracting these informative content such as entities is a challenging task. The main aim of this paper is to extract entities from Malayalam social media text efficiently. The social media corpus used in our system is from FIRE2015 entity extraction task. This data is initially subjected to pre-processing and feature extraction and then proceeds with entity extraction. Apart from the conventional stylometric features like prefixes, suffixes, hash tags etc., and POS tags, unsupervised word embedding features obtained from Structured Skip-gram model are utilized to train the system. The extracted features is given to the Support vector machine classifier to build and train model. Testing of the system resulted in better accuracy than the existing systems evaluated in FIRE2015 tasks. Unsupervised features retrieved using Structured Skip-gram model contributes to the reason for achieving better performance.

机译：社交媒体文本通常是非正式且嘈杂的，但有时往往具有翔实的内容。提取这些信息内容（例如实体）是一项艰巨的任务。本文的主要目的是有效地从马拉雅拉姆语社交媒体文本中提取实体。我们系统中使用的社交媒体语料库来自FIRE2015实体提取任务。该数据首先要进行预处理和特征提取，然后再进行实体提取。除了常规的样式特征（如前缀，后缀，哈希标签等）和POS标签外，还使用从结构化跳过图模型获得的无监督词嵌入功能来训练系统。提取的特征被提供给支持向量机分类器以构建和训练模型。与在FIRE2015任务中评估的现有系统相比，对该系统进行的测试产生的准确性更高。使用结构化跳过图模型检索的无监督特征是实现更好性能的原因。

著录项

来源
《Procedia Computer Science》 |2016年第1期|共7页
作者
G. Remmiya Devi; P.V. Veena; M. Anand Kumar; K.P. Soman;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Web Entity Detection for Semi-structured Text Data Records with Unlabeled Data [J] . Chunliang Lu, Lidong Bing, Wai Lam, International journal of computational linguistics and applications . 2013,第2期

机译：具有未标记数据的半结构化文本数据记录的Web实体检测
2. Texture feature-based text region segmentation in social multimedia data [J] . Kim Sul-Ho, An Kwon-Jae, Jang Seok-Woo, Multimedia Tools and Applications . 2016,第20期

机译：社交多媒体数据中基于纹理特征的文本区域分割
3. Supervised two-step feature extraction for structured representation of text data [J] . Ondrej Hava, Miroslav Skrbek, Pavel Kordik Simulation modelling practice and theory: International journal of the Federation of European Simulation Societies . 2013,第Null期

机译：有监督的两步特征提取，用于文本数据的结构化表示
4. Some Syntax-Only Text Feature Extraction and Analysis Methods for Social Media Data [C] . Monte Hancock, Charles Li, Shakeel Rajwani, International conference on human-computer interaction;International conference on augmented cognition . 2017

机译：社交媒体数据的仅语法语法特征提取和分析方法
5. Improving named entity recognition with co-training and unlabeled bilingual data. [D] . Ma, Xiaoyi. 2008

机译：通过共同训练和未标记的双语数据来改善命名实体的识别能力。
6. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts [O] . Yanshan Wang, Majid Rastegar-Mojarad, Ravikumar Komandur-Elayavilli, 2017

机译：利用单词嵌入和医学实体提取来使用非结构化文本检索生物医学数据集
7. Entity Extraction for Malayalam Social Media Text Using Structured Skip-gram Based Embedding Features from Unlabeled Data [O] . Devi G. Remmiya, Veena P.V., Kumar M. Anand, 2016

机译：使用基于结构化跳过图的嵌入特征从未标记数据中提取马拉雅拉姆语社交媒体文本的实体

Entity Extraction for Malayalam Social Media Text Using Structured Skip-gram Based Embedding Features from Unlabeled Data

摘要

著录项

相似文献

相关主题

期刊订阅