TCtract-A Collocation Extraction Approach for Noun Phrases Using Shallow Parsing Rules and Statistic Models

机译：TCtract-A浅析浅析浅析规则和统计模型的名词短语搭配提取方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a hybrid method for extracting Chinese noun phrase collocations that combines a statistical model with rule-based linguistic knowledge. The algorithm first extracts all the noun phrase collocations from a shallow parsed corpus by using syntactic knowledge in the form of phrase rules. It then removes pseudo collocations by using a set of statistic-based association measures (AMs) as filters. There are two main purposes for the design of this hybrid algorithm: (1) to maintain a reasonable recall while improving the precision, and (2) to investigate the proposed association measures on Chinese noun phrase collocations. The performance is compared with a pure statistical model and a pure rule-based method on a 60MB PoS tagged corpus. The experiment results show that the proposed hybrid method has a higher precision of 92.65% and recall of 47% based on 29 randomly selected noun headwords compared with the precision of 78.87% and recall of 27.19% of a statistics based extraction system. The F-score improvement is 55.7%.

机译：本文提出了一种混合的提取汉语名词短语搭配的方法，该方法将统计模型与基于规则的语言知识相结合。该算法首先通过使用短语规则形式的句法知识从浅层分析的语料库中提取所有名词短语搭配。然后，通过使用一组基于统计的关联度量（AM）作为过滤器来删除伪搭配。该混合算法的设计有两个主要目的：（1）在提高精度的同时保持合理的回忆，（2）研究提出的汉语名词短语搭配的关联度量。将性能与60MB PoS标记语料库上的纯统计模型和基于纯规则的方法进行比较。实验结果表明，与基于统计的提取系统的78.87％和27.19％的查全率相比，基于29种随机选择的名词headwords的混合方法具有92.65％的准确率和47％的查全率。 F得分提高了55.7％。

著录项

来源
《Pacific Asia Conference on Language, Information and Computation; 20061101-03; Wuhan(CN)》|2006年|P.109-116|共8页
会议地点 Wuhan(CN)
作者
Wan Yin Li; Qin Lu; James Liu;
展开▼
作者单位

Department of Computing The Hong Kong Polytechnic University Hung Hom, Kowloon, Hong Kong;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
collocation extraction; typed collocations; phrase rules; association measures;

机译：搭配提取;类型搭配;短语规则;联想测度;
入库时间 2022-08-26 14:20:59

相似文献

外文文献
中文文献
专利

1. Parsing Noun Phrases in the Penn Treebank [J] . David Vada, James R. Curra Computational linguistics . 2011,第4期

机译：在Penn Treebank中解析名词短语
2. Concept symbols revisited: Naming clusters by parsing and filtering of noun phrases from citation contexts of concept symbols [J] . JESPER W. SCHNEIDER Scientometrics . 2006,第3期

机译：重访概念符号：通过从概念符号的引用上下文中解析和过滤名词短语来命名群集
3. Concept symbols revisited: Naming clusters by parsing and filtering of noun phrases from citation contexts of concept symbols [J] . Jesper W. Schneider Scientometrics . 2006,第3期

机译：重访概念符号：通过从概念符号的引用上下文中解析和过滤名词短语来命名群集
4. TCtract-A Collocation Extraction Approach for Noun Phrases Using Shallow Parsing Rules and Statistic Models [C] . Wan Yin Li, Qin Lu, James Liu, Pacific Asia Conference on Language, Information and Computation . 2006

机译：使用浅析规则和统计模型的名词短语搭配提取方法
5. Noun phrases in documents: Preprocessing, automatic extraction, and statistical analysis in different categories of text. [D] . Kim, Youngin. 2002

机译：文档中的名词短语：对不同类别的文本进行预处理，自动提取和统计分析。
6. Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon [O] . Yang Huang, Henry J. Lowe, Dan Klein, 2005

机译：使用高性能统计自然语言解析器和UMLS专家词典增强了临床放射学报告中名词短语的识别度
7. TCtract-A Collocation Extraction Approach for Noun Phrases Using Shallow Parsing Rules and Statistic Models [O] . Li Wan Yin, Lu Qin, Liu James 2006

机译：TCtract-A浅析浅析浅析规则和统计模型的名词短语搭配提取方法
8. Equipment Model and Its Role in the Interpretation of Noun Phrases [R] . Ksiezyk, T., Grishman, R., Sterling, J. 1987

机译：设备模型及其在名词短语解释中的作用

TCtract-A Collocation Extraction Approach for Noun Phrases Using Shallow Parsing Rules and Statistic Models

摘要

著录项

相似文献

相关主题

期刊订阅