Knowledge Discovery with CRF-Based Clustering of Named Entities without a Priori Classes

机译：知识发现与基于CRF的命名实体的群集，没有先验的课程

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Knowledge discovery aims at bringing out coherent groups of entities. It is usually based on clustering which necessitates defining a notion of similarity between the relevant entities. In this paper, we propose to divert a supervised machine learning technique (namely Conditional Random Fields, widely used for supervised labeling tasks) in order to calculate, indirectly and without supervision, similarities among text sequences. Our approach consists in generating artificial labeling problems on the data to reveal regularities between entities through their labeling. We describe how this framework can be implemented and experiment it on two information extraction/discovery tasks. The results demonstrate the usefulness of this unsupervised approach, and open many avenues for defining similarities for complex representations of textual data.

机译：知识发现旨在结合一群实体。它通常基于群集，这需要定义相关实体之间的相似概念。在本文中，我们建议转移监督机器学习技术（即条件随机字段，广泛用于监督标签任务），以便间接和无监督，文本序列之间的相似性。我们的方法包括在数据上生成人工标签问题，以通过标签揭示实体之间的规律。我们描述了如何在两个信息提取/发现任务中实现和实验该框架。结果表明了这种无监督方法的有用性，并开辟了许多途径，用于定义文本数据的复杂表示的相似之处。

著录项

来源
《Conference on Intelligent Text Processing and Computational Linguistics;CICLing 2014》|2014年||共14页
会议地点
作者
Vincent Claveau; Abir Ncibi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-532;
关键词
textual data; unsupervised approach; avenues;

机译：文本数据;无人监督的方法;途径;

相似文献

外文文献
中文文献
专利

1. Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content1 [J] . ?eker G?khan Ak?n, Eryi?it Gül?en Semantic web . 2017,第5期

机译：扩展基于CRF的命名实体识别模型，用于土耳其良好的文本和用户生成的content1
2. Massive parallel sequencing uncovers actionable FGFR2–PPHLN1 fusion and ARAF mutations in intrahepatic cholangiocarcinoma [J] . Daniela Sia, Bojan Losic, Agrin Moeini, Nature Communications . 2015,第1期

机译：大规模并行测序发现可行的 FGFR2 – PPHLN1 融合和 <肝内胆管癌的named-entity> ARAF 突变
3. Dppa3 expression is critical for generation of fully reprogrammed iPS cells and maintenance of Dlk1-Dio3 imprinting [J] . Xingbo Xu, Lukasz Smorag, Toshinobu Nakamura, Nature Communications . 2015,第2016期

机译： Dppa3 表达对于生成完全重新编程的iPS细胞和维护 Dlk1 - Dio3 印记
4. Knowledge Discovery with CRF-Based Clustering of Named Entities without a Priori Classes [C] . Vincent Claveau, Abir Ncibi International conference on intelligent text processing and computational linguistics . 2014

机译：基于CRF的无先验类的命名实体聚类的知识发现
5. From Preprocessing to Named Entity Recognition, Linking and Clustering in Multilingual, Cross-Lingual, High-Low Resources Settings [D] . Zirikly, Ayah. 2018

机译：从预处理到命名实体识别，多语言，跨语言，高低资源设置中的链接和聚类
6. A Computationally Efficient Exploratory Approach to Brain Connectivity Incorporating False Discovery Rate Control A Priori Knowledge and Group Inference [O] . Aiping Liu, Junning Li, Z. Jane Wang, 2012

机译：一种计算效率高的探索性方法结合了错误发现率控制先验知识和小组推断的大脑连接性。
7. Knowledge discovery with CRF-based clustering of named entities without a priori classes [O] . Claveau, Vincent, Ncibi, Abir 2014

机译：基于CRF的命名实体集群的知识发现，无需先验类

Knowledge Discovery with CRF-Based Clustering of Named Entities without a Priori Classes

摘要

著录项

相似文献

相关主题

期刊订阅