首页> 外文会议>ACM/IEEE on joint conference on digital libraries >Phrases as Subtopical Concepts in Scholarly Text
【24h】

Phrases as Subtopical Concepts in Scholarly Text

机译:短语在学术文本中的副控制

获取原文

摘要

Retrieval of subtopical concepts from scholarly communication systems is now possible through a combination of text and metadata analysis, augmented by user search queries and click logs. Here we investigate how a "phrase", defined as a variable length sequence of vocabulary words, can be used to represent a concept. We present a method to extract such phrases from a text corpus, and rank them using a citation network measure, the compensated normalized link count (CNLC), which measures the extent to which they are propagated along the citation structure of articles. We validate the ranking with actively and passively determined metrics: comparison with hum an-assigned keywords, and comparison with passively harvested terms from search query logs. This method is demonstrated on full texts and abstracts from 7 years of high energy physics articles from the arXiv preprint database.
机译:通过文本和元数据分析的组合,通过用户搜索查询增强,从用户搜索查询中获取来自学术通信系统的亚波概念的检索,并单击“日志”。在这里,我们调查如何将“短语”,定义为可变长度序列的词汇单词,可以用于表示概念。我们介绍了一种从文本语料库中提取这些短语的方法,并使用引文网络测量来对它们进行排序,补偿归一化链路计数(CNLC),其测量它们沿着文章的引文结构传播的程度。我们用主动和被动确定的指标验证排名:与SUM分配的关键字进行比较,并将与搜索查询日志的被动收集的术语进行比较。从Arxiv预印象数据库的7年的高能物理文章中的全文和摘要上演示了该方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号