首页> 外文期刊>Journal of intelligent & fuzzy systems: Applications in Engineering and Technology >Ranking concrete and abstract words using Google Books Ngram data
【24h】

Ranking concrete and abstract words using Google Books Ngram data

机译:使用Google书籍数据排名具体和抽象单词

获取原文
获取原文并翻译 | 示例
           

摘要

Creation of dictionaries of abstract and concrete words is a well-known task. Such dictionaries are important in several applications of text analysis and computational linguistics. Usually, the process of assembling of concreteness scores for words begins with a lot of manual work. However, the process can be automated significantly using information from large corpora. In this paper we combine two datasets: a dictionary with concreteness scores of 40,000 English words and the GoogleBooks Ngram dataset, in order to test the following hypothesis: in text concrete words tend to occur with more concrete words, than with abstract words (and inverse: abstract words tend to occur with more abstract words, than with concrete words). Using the hypothesis, we proposed a method for automatic evaluation concreteness scores of words using a small amount of initial markup.
机译:创建抽象和具体词语的词典是一个知名的任务。 这些词典在文本分析和计算语言学的几个应用中很重要。 通常,用于单词的具体分数组装的过程从很多手工工作开始。 但是,该过程可以在大公司的信息中显着地自动化。 在本文中,我们组合了两个数据集:一个具有40,000英语单词和Googlebooks ngram数据集的具体分数的字典,以测试以下假设:以文本具体的单词往往以更具体的话语发生,而不是抽象词(和逆 :抽象词倾向于以更摘要的单词发生,而不是具体词语)。 使用假设,我们提出了一种使用少量初始标记自动评估与单词的分数的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号