Building the Indonesian NE Dataset Using Wikipedia and DBpedia with Entities Expansion Method on DBpedia

机译：使用Wikipedia和DBpedia结合实体扩展方法在DBpedia上构建印尼语NE数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In Indonesian, the NER (Named Entity Recognition)system still needs a lot of improvement. Though NER is the main component in IE (Information Extraction)which is used by other advanced components. To create a reliable Indonesian NER system using a machine learning approach, large dataset is needed. If the dataset is constructed by tagging it manually, the size of the dataset generated is very small. Therefore, a system was created to build Indonesian NE (Named Entities)dataset which were tagged automatically using Wikipedia data as a source of corpus and DBpedia as NE labeling reference with the Entities Expansion method to expand DBpedia NE labeling reference. Currently, the existing system cannot detect name that contain words beginning with lowercase letter on automatic tagging, the existing system have not tried adding person entity gazetteers, and the DBpedia Entities Expansion method rules can still be modified to produce better NE labeling reference quality. In this study a system was built to overcome these shortcomings. Evaluation showed that the best Indonesian NE dataset was built in this study produced Fl-score of 54.93 %, 3.32 % higher than the result of previous studies 51.61 %. This best dataset was built by adding a detection method on automatic tagging, that using the DBpedia Entities Expansion modification rules in this study, but without adding person entity gazetteers.

机译：在印度尼西亚语中，NER（命名实体识别）系统仍然需要大量改进。尽管NER是IE（信息提取）中的主要组件，但其他高级组件也使用NER。为了使用机器学习方法创建可靠的印度尼西亚NER系统，需要大数据集。如果数据集是通过手动标记构建的，则生成的数据集的大小将非常小。因此，创建了一个系统来构建印尼语NE（命名实体）数据集，该数据集使用Wikipedia数据作为语料源，并使用Entities Expansion方法将DBpedia用作NE标签参考来自动标记，以扩展DBpedia NE标签参考。当前，现有系统无法在自动标记中检测到包含以小写字母开头的单词的名称，现有系统尚未尝试添加人实体地名词典，并且仍可以修改DBpedia实体扩展方法规则以产生更好的NE标签参考质量。在这项研究中，建立了一个克服这些缺点的系统。评估显示，本研究建立的最佳印尼NE数据集产生的Fl得分为54.93％，比先前研究的结果51.61％高3.32％。该最佳数据集是通过在自动标记上添加一种检测方法而构建的，该方法在本研究中使用了DBpedia实体扩展修改规则，但未添加人员实体地名词典。

著录项

来源
《International conference on Asian language processing》|2018年|334-339|共6页
会议地点
作者
Haji Dito Murya Alfarohmi; Moch. Arif Bijaksana;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Tagging; Labeling; Encyclopedias; Electronic publishing; Internet; Organizations;

机译：标记;标签;百科全书;电子出版;互联网;组织;

相似文献

外文文献
中文文献
专利

1. Interlinking SciGraph and DBpedia Datasets Using Link Discovery and Named Entity Recognition Techniques [J] . Beyza Yaman, Michele Pasin, Markus Freudenberg OASIcs : OpenAccess Series in Informatics . 2019,第1期

机译：使用链接发现和命名实体识别技术互连SciGraph和DBpedia数据集
2. DBpedia-Entity v2: A Test Collection for Entity Search [J] . Faegheh Hasibi, Fedor Nikolaev, Chenyan Xiong, ACM SIGIR FORUM . 2017,第cd期

机译：DBpedia-Entity v2：实体搜索的测试集合
3. Disambiguating the Twitter Stream Entities and Enhancing the Search Operation Using DBpedia Ontology: Named Entity Disambiguation for Twitter Streams [J] . N. Senthil Kumar, Dinakaran Muruganantham International journal of information technology and web engineering . 2016,第2期

机译：使用DBpedia本体消除Twitter流实体的歧义并增强搜索操作：Twitter流的命名实体歧义
4. Building the Indonesian NE Dataset Using Wikipedia and DBpedia with Entities Expansion Method on DBpedia [C] . Haji Dito Murya Alfarohmi, Moch. Arif Bijaksana International Conference on Asian Language Processing . 2018

机译：使用Wikipedia和DBPedia在DBPedia上使用实体扩展方法构建印度尼西亚网站数据集
5. Automation and Expansion of the Metagenomics Analysis Methodology Using Computational Tools and Statistical Methods to Support Small High-Dimensional Datasets [D] . Hopson, Lindsay M. 2021

机译：使用计算工具和统计方法来支持小型高维数据集的自动化和扩展方法
6. Building Linked Open Data towards integration of biomedical scientific literature with DBpedia [O] . Yasunori Yamamoto, Atsuko Yamaguchi, Akinori Yonezawa 2013

机译：建立链接的开放数据以实现生物医学科学文献与DBpedia的集成
7. Figure 8: Finding semantically related entities in the DBpedia ontology: The Linked_data and Controlled_vocabulary entities in the DBpedia knowledge base are assumed to be semantically related to each other, since they are both contained under the same category, i.e., Semantic_Web. [O] . -1

机译：图8：在DBPedia本体中查找语义相关实体：DBPedia知识库中的Linked_data和Scround_vocabulary实体被假定在语义上彼此进行语义相关，因为它们都包含在同一类别下，即，Semantic_Web。

Building the Indonesian NE Dataset Using Wikipedia and DBpedia with Entities Expansion Method on DBpedia

摘要

著录项

相似文献

相关主题

期刊订阅