TCtract-A Collocation Extraction Approach for Noun Phrases Using Shallow Parsing Rules and Statistic Models

机译：使用浅析规则和统计模型的名词短语搭配提取方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a hybrid method for extracting Chinese noun phrase collocations that combines a statistical model with rule-based linguistic knowledge. The algorithm first extracts all the noun phrase collocations from a shallow parsed corpus by using syntactic knowledge in the form of phrase rules. It then removes pseudo collocations by using a set of statistic-based association measures (AMs) as filters. There are two main purposes for the design of this hybrid algorithm: (1) to maintain a reasonable recall while improving the precision, and (2) to investigate the proposed association measures on Chinese noun phrase collocations. The performance is compared with a pure statistical model and a pure rule-based method on a 60MB PoS tagged corpus. The experiment results show that the proposed hybrid method has a higher precision of 92.65% and recall of 47% based on 29 randomly selected noun headwords compared with the precision of 78.87% and recall of 27.19% of a statistics based extraction system. The F-score improvement is 55.7%.

机译：本文提出了一种用于提取与基于规则的语言知识的统计模型来提取统计模型的混合方法。该算法首先通过使用短语规则形式的语法知识来提取来自浅析语料库的所有名词短语伴侣。然后，它通过使用基于统计的关联度量（AMS）作为过滤器来删除伪搭配。这种混合算法的设计有两种主要目的：（1）保持合理的召回，同时改进精度，（2）调查中国名词短语展示的拟议关联措施。将性能与纯统计模型和基于纯规则的方法进行比较，在60MB POS标记的语料库上。实验结果表明，基于29个随机选择的NOUN百字数，拟议的杂化方法具有92.65％的更高精度为92.65％，召回47％，而是基于78.87％的精度，召回了基于统计的提取系统的27.19％。 F评分提高为55.7％。

著录项

来源
《Pacific Asia Conference on Language, Information and Computation》|2006年||共8页
会议地点
作者
Wan Yin Li; Qin Lu; James Liu; National Natural Science Foundation of China; Minsitry of Ecucation of China; Chinese Information Processing Society of China; Huazhong Normal University;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机网络;
关键词
collocation extraction; typed collocations; phrase rules; association measures;

机译：搭配提取;类型的搭配;短语规则;关联措施;
入库时间 2022-08-21 10:15:02

相似文献

外文文献
中文文献
专利

1. Parsing Noun Phrases in the Penn Treebank [J] . David Vada, James R. Curra Computational linguistics . 2011,第4期

机译：在Penn Treebank中解析名词短语
2. Concept symbols revisited: Naming clusters by parsing and filtering of noun phrases from citation contexts of concept symbols [J] . JESPER W. SCHNEIDER Scientometrics . 2006,第3期

机译：重访概念符号：通过从概念符号的引用上下文中解析和过滤名词短语来命名群集
3. Concept symbols revisited: Naming clusters by parsing and filtering of noun phrases from citation contexts of concept symbols [J] . Jesper W. Schneider Scientometrics . 2006,第3期

机译：重访概念符号：通过从概念符号的引用上下文中解析和过滤名词短语来命名群集
4. TCtract-A Collocation Extraction Approach for Noun Phrases Using Shallow Parsing Rules and Statistic Models [C] . Wan Yin Li, Qin Lu, James Liu Pacific Asia Conference on Language, Information and Computation; 20061101-03; Wuhan(CN) . 2006

机译：TCtract-A浅析浅析浅析规则和统计模型的名词短语搭配提取方法
5. Noun phrases in documents: Preprocessing, automatic extraction, and statistical analysis in different categories of text. [D] . Kim, Youngin. 2002

机译：文档中的名词短语：对不同类别的文本进行预处理，自动提取和统计分析。
6. Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon [O] . Yang Huang, Henry J. Lowe, Dan Klein, 2005

机译：使用高性能统计自然语言解析器和UMLS专家词典增强了临床放射学报告中名词短语的识别度
7. TCtract-A Collocation Extraction Approach for Noun Phrases Using Shallow Parsing Rules and Statistic Models [O] . Li Wan Yin, Lu Qin, Liu James 2006

机译：TCtract-A浅析浅析浅析规则和统计模型的名词短语搭配提取方法
8. Equipment Model and Its Role in the Interpretation of Noun Phrases [R] . Ksiezyk, T., Grishman, R., Sterling, J. 1987

机译：设备模型及其在名词短语解释中的作用

TCtract-A Collocation Extraction Approach for Noun Phrases Using Shallow Parsing Rules and Statistic Models

摘要

著录项

相似文献

相关主题

期刊订阅