A Probabilistic Address Parser Using Conditional Random Fields and Stochastic Regular Grammar

机译：使用条件随机字段和随机正则语法的概率地址解析器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic semantic annotation of data from databases or the web is an important pre-process for data cleansing and record linkage. It can be used to resolve the problem of imperfect field alignment in a database or identify comparable fields for matching records from multiple sources. The annotation process is not trivial because data values may be noisy, such as abbreviations, variations or misspellings. In particular, overlapping features usually exist in a lexicon-based approach. In this work, we present a probabilistic address parser based on linear-chain conditional random fields (CRFs), which allow more expressive token-level features compared to hidden Markov models (HMMs). In additions, we also proposed two general enhancement techniques to improve the performance. One is taking original semi-structure of the data into account. Another is post-processing of the output sequences of the parser by combining its conditional probability and a score function, which is based on a learned stochastic regular grammar (SRG) that captures segment-level dependencies. Experiments were conducted by comparing the CRF parser to a HMM parser and a semi-Markov CRF parser in two real-world datasets. The CRF parser out-performed the HMM parser and the semi-Markov CRF in both datasets in terms of classification accuracy. Leveraging the structure of the data and combining the linear-chain CRF with the SRG further improved the parser to achieve an accuracy of 97% on a postal dataset and 96% on a company dataset.

机译：来自数据库或Web的数据的自动语义注释是数据清理和记录链接的重要预处理。它可用于解决数据库中字段对齐不完善的问题，或标识可比较的字段以匹配来自多个源的记录。由于数据值可能是嘈杂的，例如缩写，变体或拼写错误，因此注释过程并非易事。特别地，重叠特征通常存在于基于词典的方法中。在这项工作中，我们提出了一种基于线性链条件随机字段（CRF）的概率地址解析器，与隐马尔可夫模型（HMM）相比，该解析器可以提供更具表达力的令牌级功能。此外，我们还提出了两种通用的增强技术来提高性能。一种是考虑数据的原始半结构。另一个是通过结合解析器的条件概率和得分函数对解析器的输出序列进行后处理，该函数基于学习的随机常规语法（SRG）来捕获段级别的依存关系。通过在两个真实的数据集中比较CRF解析器，HMM解析器和半Markov CRF解析器来进行实验。就分类准确性而言，CRF解析器在两个数据集中均优于HMM解析器和半马尔可夫CRF。利用数据的结构并将线性链CRF与SRG结合使用，进一步改善了解析器的准确性，在邮政数据集上的准确性达到97％，在公司数据集上的准确性达到96％。

著录项

来源
《IEEE International Conference on Data Mining Workshops》|2016年|225-232|共8页
会议地点
作者
Minlue Wang; Valeriia Haberland; Amos Yeo; Andrew Martin; John Howroyd; J. Mark Bishop;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Hidden Markov models; Roads; Semantics; Stochastic processes; Grammar; Databases; Couplings;

机译：隐马尔可夫模型;道路;语义;随机过程;语法;数据库;耦合;

相似文献

外文文献
中文文献
专利

1. Parsing fashion image into mid-level semantic parts based on chain-conditional random fields [J] . Wang Fan, Zhao Qiyang, Yin Baolin, Image Processing, IET . 2016,第6期

机译：基于链条件随机字段将时尚图像解析为中层语义部分
2. Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields [J] . Mariana Damova Computing reviews . 2013,第12期

机译：将复合识别和PCFG-LA解析与词格和条件随机场相结合
3. Parsing citations in biomedical articles using conditional random fields. [J] . Zhang Q, Cao YG, Yu H Computers in Biology and Medicine . 2011,第4期

机译：使用条件随机字段来解析生物医学文章中的引用。
4. A Probabilistic Address Parser Using Conditional Random Fields and Stochastic Regular Grammar [C] . Minlue Wang, Valeriia Haberland, Amos Yeo, IEEE International Conference on Data Mining Workshops . 2016

机译：使用条件随机字段和随机常规语法的概率地址解析器
5. SELECTED TOPICS IN SPATIAL STATISTICAL ANALYSIS: NONSTATIONARY VECTOR KRIGING, LARGE SCALE CONDITIONAL SIMULATION OF THREE-DIMENSIONAL GAUSSIAN RANDOM FIELDS, AND HYPOTHESIS TESTING IN A CORRELATED RANDOM FIELD [D] . QUIMBY, WILLIAM F. 1986

机译：空间统计分析中的选定主题：非平稳向量Kriging，三维高斯随机场的大规模条件模拟以及相关随机场中的假设检验
6. Sparse reconstruction of compressive sensing MRI using cross-domain stochastically fully connected conditional random fields [O] . Edward Li, Farzad Khalvati, Mohammad Javad Shafiee, 2016

机译：使用跨域随机全连接条件随机场的压缩感知MRI稀疏重建
7. A Probabilistic Address Parser Using Conditional Random Fields and Stochastic Regular Grammar [O] . Wang, Minlue, Haberland, Valeriia, Yeo, Amos, 2016

机译：使用条件随机字段和随机正则语法的概率地址解析器

A Probabilistic Address Parser Using Conditional Random Fields and Stochastic Regular Grammar

摘要

著录项

相似文献

相关主题

期刊订阅