Semi-supervised learning approach for Indonesian Named Entity Recognition (NER) using co-training algorithm

机译：使用共同训练算法的印度尼西亚指定实体识别（ner）的半监督学习方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of utilizing machine learning approach in Indonesian Named Entity Recognition (NER) system is the limited amount of labelled data for training process. However, unlike the limited availability of labelled data, unlabelled data is widely available from many sources. This enables a semi-supervised learning approach to solve this NER system problem. This research aims to design a semi-supervised learning model to solve NER system problem. A semi-supervised co-training learning is used to utilize unlabelled data in NER learning process to produce new labelled data that can be applied to enhance a new NER classification system. This research uses two kinds of data, Indonesian DBPedia data as labelled data and news article text from Indonesian news sites (kompas.com, cnnindonesia.com, tempo.co, merdeka.com and viva.co.id) as unlabelled data. The pre-processing steps applied to analyze unstructured text are sentence segmentation, tokenization, stemming, and PoS Tagging. The results of this pre-process are the NER and its context used as unlabelled data for the semi-supervised co-training process. The SVM algorithm is used as a classi□cation algorithm in this process. 10 Cross Fold Validation is used as the system testing approach. Based on the result of the NER testing system, the precision is 73.6%, the recall is 80.1% and f1 mean is 76.5%.

机译：利用机器学习方法在印度尼西亚命名实体识别（NER）系统中的问题是用于训练过程的标记数据量有限。但是，与标记数据的有限可用性不同，未标记的数据广泛可从许多来源获得。这使得半监督的学习方法能够解决这个问题的问题。本研究旨在设计半监督学习模型来解决新系统问题。半监督的共同培训学习用于利用NER学习过程中的未标记数据来产生可以应用的新标记数据，以增强新的NER分类系统。本研究使用两种数据，印度尼西亚DBPedia数据作为标记的数据和新闻文本来自印度尼西亚新闻网站的新闻文本（Kompas.com，CNNIndonesia.com，Tempo.co，Merdeka.com和Viva.co.Id）作为未标记的数据。应用于分析非结构化文本的预处理步骤是句子分割，标记化，止算和POS标记。该预处理的结果是NER及其上下文，用作半监督共同培训过程的未标记数据。 SVM算法在此过程中用作类别□阳离子算法。 10交叉折叠验证用作系统测试方法。基于NER测试系统的结果，精度为73.6％，召回为80.1％，F1平均值为76.5％。

著录项

来源
《International Seminar on Intelligent Technology and Its Applications》|2016年|xlii 694 p. :|共6页
会议地点
作者
Bayu Aryoyudanta; Teguh Bharata Adji; Indriana Hidayah;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化基础理论;
关键词
Semisupervised learning; Tagging; Support vector machines; Context; Training; Testing; Mathematical model;

机译：半学习;标记;支持向量机;背景;培训;测试;数学模型;

相似文献

外文文献
中文文献
专利

1. Named entity recognition: a semi-supervised learning approach [J] . H. Sintayehu, G. S. Lehal International Journal of Information Technology . 2021,第4期

机译：命名实体识别：半监督学习方法
2. TL-NER: A Transfer Learning Model for Chinese Named Entity Recognition [J] . DunLu Peng, YinRui Wang, Cong Liu, Information systems frontiers . 2020,第6期

机译：TL-ner：中国名称实体识别的转移学习模型
3. Learning to select pseudo labels:a semi-supervised method for named entity recognition [J] . Zhen-zhen LI, Da-wei FENG, Dong-sheng LI, 浙江大学学报（英文版）（C辑：计算机与电子） . 2020,第006期

机译：学习选择伪标签：一个用于命名实体识别的半监督方法
4. Semi-supervised learning approach for Indonesian Named Entity Recognition (NER) using co-training algorithm [C] . Bayu Aryoyudanta, Teguh Bharata Adji, Indriana Hidayah International Seminar on Intelligent Technology and Its Applications . 2016

机译：使用协同训练算法的印度尼西亚命名实体识别（NER）的半监督学习方法
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. A Weakly-Supervised Named Entity Recognition Machine Learning Approach for Emergency Medical Services Clinical Audit [O] . Han Wang, Wesley Lok Kin Yeung, Qin Xiang Ng, 2021

机译：紧急医疗服务临床审计的弱监督名为实体识别机器学习方法
7. A semi-supervised learning approach to arabic named entity recognition [O] . Althobaiti M, Kruschwitz U, Poesio M 2013

机译：阿拉伯命名实体识别的半监督学习方法

Semi-supervised learning approach for Indonesian Named Entity Recognition (NER) using co-training algorithm

摘要

著录项

相似文献

相关主题

期刊订阅