首页> 外文会议>2017 Fourth Asian Conference on Defence Technology - Japan >Improving distributed representation by feature selection of Wikipedia
【24h】

Improving distributed representation by feature selection of Wikipedia

机译:通过Wikipedia的特征选择来改善分布式表示

获取原文
获取原文并翻译 | 示例

摘要

Distributed representation plays an important role in many application of Natural Language Processing (NLP). Today, Word2Vec model has been getting an attention against the backdrop of the easy access to enormous language data from the Internet such as Wikipedia. For the effective use of Word2Vec, we have to concern not only about the improvement of the method itself but also about the process of making training data. In this paper, we demonstrate that adequate selection of training data can make a great improvement of the performance of Word2Vec compared to existing research. We also confirmed that Wikipedia dump data is not a good source of training data as is.
机译:分布式表示在自然语言处理(NLP)的许多应用中起着重要作用。如今,在从Wikipedia等Internet轻松访问大量语言数据的背景下,Word2Vec模型已受到关注。为了有效地使用Word2Vec,我们不仅要关注方法本身的改进,还要关注训练数据的生成过程。在本文中,我们证明与现有研究相比,适当选择训练数据可以极大地提高Word2Vec的性能。我们还确认,维基百科转储数据本身并不是训练数据的良好来源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号