首页> 外文学位 >Automatic acquisition of lexical semantic knowledge from large corpora: The identification of semantically related words, markedness, polarity, and antonymy.

【24h】

Automatic acquisition of lexical semantic knowledge from large corpora: The identification of semantically related words, markedness, polarity, and antonymy.

机译：从大型语料库自动获取词汇语义知识：识别与语义相关的单词，标记，极性和反义词。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Lexical semantic knowledge is useful, even indispensable, for many natural language processing applications. Yet, traditional approaches for acquiring this knowledge manually are expensive and cannot easily handle the requisite domain dependence. In this dissertation, I address four closely related problems from lexical semantics, describing a fully automatic system that extracts information about semantic groups and scales from large free-text corpora. The system forms groups of semantically related terms such as {dollar}{lcub}{dollar}cold, warm, hot{dollar}{rcub}, {lcub}{dollar}final, preliminary{dollar}{rcub},{dollar} and {dollar}{lcub}{dollar}court, jury, law, regulation{dollar}{rcub}.{dollar} Using gradability indicators, it identifies those of the groups that are actually linguistic scales, i.e., contain terms that can be linearly ordered on the basis of semantic strength. Scalar groups are further partitioned into two subgroups according to evaluative orientation, distinguishing between positively loaded terms (e.g., beautiful, ingenious, unbiased) and their negative counterparts (e.g., ugly, plain, lazy). Finally, the semantic orientation of each subgroup is identified. Combining the above four stages results in an automatic method for the retrieval of possibly domain-dependent pairs of antonyms. All this information is actively learned from the corpus; the system does not access any type of stored information about words such as dictionaries, thesauri, or similar databases.; I have adopted a statistical approach that combines both supervised and unsupervised learning methods and is informed by linguistic models of the data and the tasks at hand. I rely on robust, non-parametric statistical methods; multiple knowledge sources justified by linguistic analyses; and shallow syntactic and morphological processing during information extraction. I describe and justify the linguistic sources, and present the results (sometimes quite unexpected) of experimental studies that are designed to validate related hypotheses made in the linguistics literature. I also present a novel evaluation method which simultaneously employs multiple reference models without inducing a single "best" model, and results produced for several collections of adjectives and nouns. Finally, I present evidence of strengths of the hybrid linguistic-statistical approach, and discuss applications of the system's output to language problems.

机译：词汇语义知识对于许多自然语言处理应用程序都是有用的，甚至是必不可少的。然而，用于手动获得该知识的传统方法是昂贵的，并且不能轻易地处理必要的域依赖性。在这篇论文中，我从词汇语义学上解决了四个紧密相关的问题，描述了一个全自动系统，该系统从大型自由文本语料库中提取有关语义组和尺度的信息。系统形成语义相关的术语组，例如{dollar} {lcub} {dollar}冷，暖，热{dollar} {rcub}，{lcub} {dollar}最终，初步{dollar} {rcub}，{dollar} {dollar} {lcub} {dollar}法院，陪审团，法律，法规{dollar} {rcub}。{dollar}使用可分级性指标，它可以识别出实际上是语言等级的人群，即包含可以根据语义强度线性排序。标量组根据评估方向进一步分为两个子组，以区分正负项（例如，漂亮，精巧，公正）和负向项（例如，丑陋，平淡，懒惰）。最后，确定每个子组的语义方向。结合以上四个阶段，可以得到一种自动方法，用于检索可能依赖于域的反义词对。所有这些信息都是从语料库中主动学习的；该系统不访问任何有关单词的存储信息，例如字典，叙词表或类似的数据库。我采用了一种统计方法，该方法结合了有监督的学习方法和无监督的学习方法，并从数据和手头任务的语言模型中获悉。我依靠健壮的非参数统计方法；通过语言分析证明多种知识来源；信息提取过程中的浅层句法和形态处理。我描述并证明了语言学的来源，并提出了旨在验证语言学文献中相关假设的实验研究结果（有时非常出乎意料）。我还提出了一种新颖的评估方法，该方法同时采用多个参考模型而不会产生单个“最佳”模型，并且为数个形容词和名词集合产生了结果。最后，我提供了混合语言统计方法的优势的证据，并讨论了系统输出在语言问题中的应用。

著录项

作者
Hatzivassiloglou, Vasileios.;
展开▼
作者单位

Columbia University.;

展开▼
授予单位 Columbia University.;
学科 Computer Science.; Language Linguistics.
学位 Ph.D.
年度 1998
页码 599 p.
总页数 599
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;语言学;
关键词
入库时间 2022-08-17 11:48:43

相似文献

外文文献
中文文献
专利

1. Lexical acquisition and semantic space models: Learning the semantics of unknown words [J] . KOSTADIN CHOLAKOV Natural language engineering . 2014,第pta4期

机译：词汇习得和语义空间模型：学习未知词的语义
2. Automatic discovery of word semantic relations using paraphrase alignment and distributional lexical semantics analysis [J] . GAEEL DIAS, RUMEN MORALIYSKI, JOAO CORDEIRO, Natural language engineering . 2010,第4期

机译：使用释义对齐和分布词法语义分析自动发现单词语义关系
3. Phonotactic Knowledge and Lexical-Semantic Processing in One-year-olds: Brain Responses to Words and Nonsense Words in Picture Contexts [J] . Manuela Friedrich, Angela D. Friederici Journal of Cognitive Neuroscience . 2005,第11期

机译：一岁儿童的拼音知识和词汇语义处理：图片上下文中大脑对单词和无意义单词的反应
4. Measuring Semantic Similarity Between Words Using Lexical Knowledge and Neural Networks [C] . Yuhua Li, Zuhair Bandar, David Mclean International conference on intelligent data engineering and automated learning . 2002

机译：使用词汇知识和神经网络测量单词之间的语义相似性
5. Lexical acquisition and semantic representation of English phrasal verbs in ontological semantics. [D] . Televnaja, Julija. 2004

机译：本体语义中英语短语动词的词汇习得和语义表示。
6. Semi-automatic Semantic Mapping between Nomenclatures: Knowledge Interlingua Transfer-knowledge Acquisition Technique (KITKAT) [O] . Philip J. B. Brown, Nick Booth, Neill Jones 2002

机译：术语之间的半自动语义映射：知识语际转移知识获取技术（KITKAT）
7. An Application of Lexical Semantics to Knowledge Acquisition from Corpora [O] . Peter Anick, James Pustejovsky 1990

机译：词汇语义学在语料库知识获取中的应用

Automatic acquisition of lexical semantic knowledge from large corpora: The identification of semantically related words, markedness, polarity, and antonymy.

摘要

著录项

相似文献

相关主题

期刊订阅