首页> 外文会议>International Conference on Advanced Computer Science and information Systems >Extraction of Lexical Bundles used in Natural Language Processing Articles
【24h】

Extraction of Lexical Bundles used in Natural Language Processing Articles

机译:自然语言处理文章中使用的词法束的提取

获取原文

摘要

Lexical bundles are indispensable for fluent academic writing. They might not constitute complete structural units but they occur very frequently in academic conversations, conference presentations and scientific articles. This paper shows how to collect a large database of lexical bundles from articles in the Natural Language Processing (NLP) domain. We first collect highly frequent N-grams from the ACL-ARC collection of NLP articles and then classify them into true or false lexical bundles using machine learning models trained from a set of manually checked bundles. In a verification experiment, our best model achieves an accuracy of 76 %. Using this model, we extract more than 18,000 lexical bundles from the ACL-ARC corpus, which we publicly release.
机译:词汇捆绑对于流利的学术写作必不可少。它们可能不构成完整的结构单元,但在学术对话,会议演讲和科学文章中经常出现。本文展示了如何从自然语言处理(NLP)领域的文章中收集大型词库数据库。我们首先从NLP文章的ACL-ARC集合中收集频率很高的N-gram,然后使用从一组手动检查的捆绑软件中训练出来的机器学习模型将它们分类为正确或错误的词汇捆绑软件。在验证实验中,我们最好的模型可达到76%的准确度。使用此模型,我们从ACL-ARC语料库中提取了超过18,000个词汇束,并已公开发布。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号