首页> 外文会议>Information Retrieval Technology >Synonyms Extraction Using Web Content FocusedCrawling
【24h】

Synonyms Extraction Using Web Content FocusedCrawling

机译:使用Web Content FocusedCrawling提取同义词

获取原文

摘要

Documents or Web pages collected from the World Wide Web have been considered one of the most important sources for information. Using search engines to retrieve the documents can harvest lots of information, facilitating information exchange and knowledge sharing, including foreign information. However, to better understand by local readers, foreign words, like English, are often translated to local language such as Chinese. Due to different translators and the lack of translation standard, translating foreign words may pose a notorious headache and result in different transliterations, particularly in proper nouns like person names and geographical names. For example, "Bin Laden" is translated into terms "賓拉登"(binladeng) or "本拉登"(benladeng). Both are valid synonymous transliterations. In this research, we propose an approach to determining synonymous transliterations via mining Web pages retrieved by a search engine. Experiments show that the proposed approach can effectively extract synonymous transliterations given an input transliteration.
机译:从万维网收集的文档或网页被认为是最重要的信息来源之一。使用搜索引擎检索文档可以收获很多信息,从而促进信息交换和知识共享,包括外国信息。但是,为了使本地读者更好地理解,经常将外语(例如英语)翻译成当地语言(例如中文)。由于翻译人员的不同和翻译标准的缺乏,翻译外来词可能会引起臭名昭著的头痛,并导致不同的音译,特别是在专有名词(例如人名和地名)中。例如,“本拉登”被翻译成术语“宾拉登”(binladeng)或“本拉登”(benladeng)。两者都是有效的同义词音译。在这项研究中,我们提出了一种通过挖掘由搜索引擎检索的网页来确定同义词音译的方法。实验表明,在输入音译的情况下,该方法可以有效地提取同义音译。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号