首页> 外文期刊>ACM transactions on Asian language information processing >Mining Synonymous Transliterations from the World Wide Web
【24h】

Mining Synonymous Transliterations from the World Wide Web

机译:从万维网上挖掘同义音译

获取原文
获取原文并翻译 | 示例
       

摘要

The World Wide Web has been considered one of the important sources for information. Using search engines to retrieve Web pages can gather lots of information, including foreign information. However, to be better understood by local readers, proper names in a foreign language, such as English, are often transliterated to a local language such as Chinese. Due to different translators and the lack of translation standard, translating foreign proper nouns may result in different transliterations and pose a notorious headache. In particular, it may cause incomplete search results. Using one transliteration as a query keyword will fail to retrieve the Web pages which use a different word as the transliteration. Consequently, important information may be missed. We present a framework for mining synonymous transliterations as many as possible from the Web for a given transliteration. The results can be used to construct a database of synonymous transliterations which can be utilized for query expansion so as to alleviate the incomplete search problem. Experimental results show that the proposed framework can effectively retrieve the set of snippets which may contain synonymous transliterations and then extract the target terms. Most of the extracted synonymous transliterations have higher rank of similarity to the input transliteration compared to other noise terms.
机译:万维网被认为是重要的信息来源之一。使用搜索引擎检索网页可以收集很多信息,包括外国信息。但是,为了使本地读者更好地理解,通常将诸如英语之类的外语专有名词音译为诸如中文之类的本地语言。由于翻译人员的不同和翻译标准的缺乏,翻译外来专有名词可能会导致不同的音译,并带来令人头疼的麻烦。特别是,这可能会导致搜索结果不完整。使用一个音译作为查询关键字将无法检索使用其他单词作为音译的网页。因此,可能会丢失重要信息。我们为给定音译提供了一个从Web挖掘尽可能多的同义音译的框架。结果可用于构建同义音译数据库,该数据库可用于查询扩展,从而减轻不完全搜索问题。实验结果表明,提出的框架可以有效地检索可能包含同义音译的片段集,然后提取目标词。与其他噪声项相比,大多数提取的同义音译与输入音译具有更高的相似度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号