首页> 外文会议>International Conference on Applications of Natural Language to Information Systems >Testing Word Similarity: Language Independent Approach with Examples from Romance
【24h】

Testing Word Similarity: Language Independent Approach with Examples from Romance

机译:测试单词相似度:语言与浪漫示例的语言独立方法

获取原文

摘要

Identification of words with the same basic meaning (stemming) has important applications in Information Retrieval, first of all for constructing word frequency lists. Usual morphologically-based approaches (including the Porter stemmers) rely on language-dependent linguistic resources or knowledge, which causes problems when working with multilingual data and multi-thematic document collections. We suggest several empirical formulae with easy to adjust parameters and demonstrate how to construct such formulae for a given language using an inductive method of model self-organization. This method considers a set of models (formulae) of a given class and selects the best ones using training and test samples. We describe the method and give detailed examples for French, Italian, Portuguese, and Spanish. The formulae are examined on real domain-oriented document collections. Our approach can be easily applied to other European languages.
机译:识别具有相同基本含义(Stemming)的单词在信息检索中具有重要的应用,首先用于构建字频率列表。通常的基于形态学的方法(包括Porter Seculmers)依赖于语言依赖语言资源或知识,这在使用多语言数据和多主题文档集合时会导致问题。我们建议使用易于调整参数的若干经验公式,并使用模型自组织的归纳方法演示如何为给定语言构建这种配方。该方法考虑给定类的一组模型(公式),并使用培训和测试样本选择最佳的模型。我们描述了该方法,并为法国,意大利语,葡萄牙语和西班牙语提供详细的例子。在真实的域的文件集合上检查公式。我们的方法可以轻松应用于其他欧洲语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号