首页> 外文会议>Conference on empirical methods in natural language processing >Extracting Clusters of Specialist Terms from Unstructured Text
【24h】

Extracting Clusters of Specialist Terms from Unstructured Text

机译:从非结构化文本中提取专家术语的集群

获取原文

摘要

Automatically identifying related specialist terms is a difficult and important task required to understand the lexical structure of language. This paper develops a corpus-based method of extracting coherent clusters of satellite terminology -terms on the edge of the lexicon - using co-occurrence networks of unstructured text. Term clusters are identified by extracting communities in the cooccurrence graph, after which the largest is discarded and the remaining words are ranked by centrality within a community. The method is tractable on large corpora, requires no document structure and minimal normalization. The results suggest that the model is able to extract coherent groups of satellite terms in corpora with varying size, content and structure. The findings also confirm that language consists of a densely connected core (observed in dictionaries) and systematic, se-mantically coherent groups of terms at the edges of the lexicon.
机译:自动识别相关专家术语是理解语言词汇结构所需的困难和重要的任务。 本文开发了一种基于语料库的卫星术语 - 卫星术语 - 在词典边缘的相干簇 - 使用非结构化文本的共同发生网络。 通过在Cooccurrence图中提取社区来识别术语集群,之后丢弃最大的群体,并且剩余的单词被社区内的居民排名。 该方法在大型语料库上是易行的,不需要文件结构和最小的标准化。 结果表明,该模型能够在Corpora中提取具有不同尺寸,内容和结构的Corpora中的连贯组。 调查结果还证实,语言包括在词典边缘的密集连接的核心(在词典中观察)和系统的Se-术语相干群体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号