首页> 外文会议>Principles of data mining and knowledge discovery >Using Loglinear Clustering for Subcategorization Identification
【24h】

Using Loglinear Clustering for Subcategorization Identification

机译:使用对数线性聚类进行子类别识别

获取原文
获取原文并翻译 | 示例

摘要

In this paper we will describe a process for mining syntactical verbal subcategorization, i.e. the information about the kind of phrases or clauses a verb goes with. We will use a large text corpus having almost 10,000,000 tagged words as our resource material. Loglinear modeling is used to analyze and automatically identify the subcategorization dependencies. An unsupervised clustering algorithm is used to accurately determine verbal subcategorization frames. In this paper we just tackle verbal subcategorization of noun phrases and prepositional phrases. A sample of 81 Portuguese verbs was used for evaluation purposes 97percent precision and 99percent recall for noun phrases and 92percent precision and 100percent recall for prepositional phrases was obtained.
机译:在本文中,我们将描述挖掘句法性语言子分类的过程,即有关动词附带的短语或从句类型的信息。我们将使用一个具有将近10,000,000个带标签单词的大型文本语料库作为我们的资源材料。对数线性建模用于分析和自动识别子类别依赖性。使用无监督聚类算法来准确确定言语子分类框架。在本文中,我们只处理名词短语和介词短语的语言子分类。抽样使用81个葡萄牙语动词进行评估,名词短语的准确度为97%,召回率为99%,介词短语的准确度为92%,召回率为100%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号