Semi-supervised learning approach for Indonesian Named Entity Recognition (NER) using co-training algorithm

机译：使用协同训练算法的印度尼西亚命名实体识别（NER）的半监督学习方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of utilizing machine learning approach in Indonesian Named Entity Recognition (NER) system is the limited amount of labelled data for training process. However, unlike the limited availability of labelled data, unlabelled data is widely available from many sources. This enables a semi-supervised learning approach to solve this NER system problem. This research aims to design a semi-supervised learning model to solve NER system problem. A semi-supervised co-training learning is used to utilize unlabelled data in NER learning process to produce new labelled data that can be applied to enhance a new NER classification system. This research uses two kinds of data, Indonesian DBPedia data as labelled data and news article text from Indonesian news sites (kompas.com, cnnindonesia.com, tempo.co, merdeka.com and viva.co.id) as unlabelled data. The pre-processing steps applied to analyze unstructured text are sentence segmentation, tokenization, stemming, and PoS Tagging. The results of this pre-process are the NER and its context used as unlabelled data for the semi-supervised co-training process. The SVM algorithm is used as a classi□cation algorithm in this process. 10 Cross Fold Validation is used as the system testing approach. Based on the result of the NER testing system, the precision is 73.6%, the recall is 80.1% and f1 mean is 76.5%.

机译：在印度尼西亚命名实体识别（NER）系统中使用机器学习方法的问题是训练过程中标记数据的数量有限。但是，与标记数据的可用性有限不同，未标记数据可从许多来源广泛获得。这使半监督学习方法可以解决此NER系统问题。本研究旨在设计一种半监督学习模型来解决NER系统问题。半监督协同训练学习用于在NER学习过程中利用未标记的数据来生成新的标记数据，这些数据可用于增强新的NER分类系统。本研究使用两种数据，即印度尼西亚DBPedia数据作为标记数据和来自印度尼西亚新闻站点（kompas.com，cnnindonesia.com，tempo.co，merdeka.com和viva.co.id）的新闻文章文本作为未标记数据。用于分析非结构化文本的预处理步骤是句子分段，标记化，词干和PoS标记。此预处理的结果是NER及其上下文用作半监督式联合训练过程的未标记数据。在此过程中，将SVM算法用作分类算法。 10交叉折叠验证用作系统测试方法。根据NER测试系统的结果，精度为73.6％，召回率为80.1％，f1平均值为76.5％。

著录项

来源
《International Seminar on Intelligent Technology and Its Applications》|2016年|7-12|共6页
会议地点
作者
Bayu Aryoyudanta; Teguh Bharata Adji; Indriana Hidayah;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Semisupervised learning; Tagging; Support vector machines; Context; Training; Testing; Mathematical model;

机译：半监督学习;标记;支持向量机;上下文;训练;测试;数学模型;

相似文献

外文文献
中文文献
专利

1. Named entity recognition: a semi-supervised learning approach [J] . H. Sintayehu, G. S. Lehal International Journal of Information Technology . 2021,第4期

机译：命名实体识别：半监督学习方法
2. TL-NER: A Transfer Learning Model for Chinese Named Entity Recognition [J] . DunLu Peng, YinRui Wang, Cong Liu, Information systems frontiers . 2020,第6期

机译：TL-ner：中国名称实体识别的转移学习模型
3. Learning to select pseudo labels:a semi-supervised method for named entity recognition [J] . Zhen-zhen LI, Da-wei FENG, Dong-sheng LI, 浙江大学学报（英文版）（C辑：计算机与电子） . 2020,第006期

机译：学习选择伪标签：一个用于命名实体识别的半监督方法
4. Semi-supervised learning approach for Indonesian Named Entity Recognition (NER) using co-training algorithm [C] . Bayu Aryoyudanta, Teguh Bharata Adji, Indriana Hidayah International Seminar on Intelligent Technology and Its Applications . 2016

机译：使用共同训练算法的印度尼西亚指定实体识别（ner）的半监督学习方法
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. A Weakly-Supervised Named Entity Recognition Machine Learning Approach for Emergency Medical Services Clinical Audit [O] . Han Wang, Wesley Lok Kin Yeung, Qin Xiang Ng, 2021

机译：紧急医疗服务临床审计的弱监督名为实体识别机器学习方法
7. A semi-supervised learning approach to arabic named entity recognition [O] . Althobaiti M, Kruschwitz U, Poesio M 2013

机译：阿拉伯命名实体识别的半监督学习方法

Semi-supervised learning approach for Indonesian Named Entity Recognition (NER) using co-training algorithm

摘要

著录项

相似文献

相关主题

期刊订阅