Exploiting Structured Reference Data for Unsupervised Text Segmentation with Conditional Random Fields

机译：利用带有条件随机字段的无监督文本分段的结构化参考数据

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Text segmentation is the process of converting information in unstructured text into structured records. This is an important problem since structured data is amenable to efficient query processing. CRFs are a class of discriminative probabilistic models that are gaining acceptance as an effective computing machinery for text segmentation. An important aspect of CRFs is learning model parameters from labeled training data. Labeling can be a labor intensive process. One can avoid the labeling step by using structured reference tables whose data domains and that of the input text data given for segmentation, coincide. In other words the labels in the training data drawn from reference tables "come for free". Inspired by recent work on their use for training HMMs, we developed an unsupervised technique for text segmentation with CRFs using reference tables. Assuming text sequences to be segmented come in batches and sequences in a batch conform to the same attribute order, we build CRF models for each attribute in the reference table, use them to decide the attribute order of a batch of input sequences, derive labeled training data from the reference table according to that order, and train a global CRF model to segment the input sequences in the batch. Preliminary experimental results indicate that our technique works well in practice.

机译：文本分段是将非结构化文本中信息转换为结构化记录的过程。这是一个重要问题，因为结构化数据适用于有效查询处理。 CRFS是一类判别概率模型，该模型正在接受作为文本细分的有效计算机械。 CRFS的一个重要方面是从标记的训练数据学习模型参数。标签可以是劳动密集型过程。可以通过使用结构化的参考表来避免标记步骤，其数据域和分段给出的输入文本数据的数据域和输入文本数据。换句话说，从参考表中汲取的训练数据中的标签“免费”。灵感来自最近对培训HMMS的使用，我们开发了一种无监督的技术，用于使用参考表与CRF的文本分段。假设要分段的文本序列有批处理和序列符合相同的属性顺序，我们为每个属性构建了参考表中的每个属性的CRF模型，使用它们来决定批次输入序列的属性顺序，导出标记的培训根据该顺序从参考表中的数据，并训练全局CRF模型，以划分批次中的输入序列。初步实验结果表明，我们的技术在实践中运作良好。

著录项

来源
《SIAM International Conference on Data Mining》|2008年|869 p.|共12页
会议地点
作者
Chang Zhao; Jalal Mahmud; I. V. Ramakrishnan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP274.2-53;
关键词

相似文献

外文文献
中文文献
专利

1. AN UNSUPERVISED SEGMENTATION METHOD FOR REMOTE SENSING IMAGERY BASED ON CONDITIONAL RANDOM FIELDS [J] . A. R. Soares, T. S. K?rting, L. M. G. Fonseca, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences . 2020,第4期

机译：基于条件随机字段的遥感图像的无监督分段方法
2. Unsupervised SAR image segmentation using high-order conditional random fields model based on product-of-experts [J] . Zhang Peng, Li Ming, Wu Yan, Pattern recognition letters . 2016,第Jula15期

机译：基于专家产品的高阶条件随机场模型的无监督SAR图像分割
3. A conditional random field approach to unsupervised texture image segmentation [J] . Li C.-T. EURASIP journal on advances in signal processing . 2010,第17期

机译：一种无监督纹理图像分割的条件随机场方法
4. Exploiting Structured Reference Data for Unsupervised Text Segmentation with Conditional Random Fields [C] . Chang Zhao, Jalal Mahmud, I. V. Ramakrishnan SIAM International Conference on Data Mining . 2008

机译：利用带有条件随机字段的无监督文本分段的结构化参考数据
5. Unsupervised segmentation of noisy and textured images modeled by Markov random fields. [D] . Gregoriou, Georghios Kyriacos. 1992

机译：由Markov随机场建模的噪点和纹理图像的无监督分割。
6. Left ventricular segmentation from MRI datasets with edge modelling conditional random fields [O] . Janto F Dreijer, Ben M Herbst, Johan A du Preez 2013

机译：带有边缘建模条件随机场的MRI数据集的左心室分割
7. Exploiting Structured Reference Data for Unsupervised Text Segmentation with Conditional Random Fields [O] . Chang Zhao, Jalal Mahmud, I. V. Ramakrishnan 2008

机译：利用结构化参考数据进行带条件随机字段的无监督文本分割

Exploiting Structured Reference Data for Unsupervised Text Segmentation with Conditional Random Fields

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅