首页> 外国专利> MINING BILINGUAL DICTIONARIES FROM MONOLINGUAL WEB PAGES

MINING BILINGUAL DICTIONARIES FROM MONOLINGUAL WEB PAGES

机译:从单语网页上挖掘双语词典

摘要

Systems and methods for identifying translation pairs from web pages are provided. One disclosed method includes receiving monolingual web page data of a source language, and processing the web page data by detecting the occurrence of a predefined pattern in the web page data, and extracting a plurality of translation pair candidates. Each of the translation pair candidates may include a source language string and target language string. The method may further include determining whether each translation pair candidate is a valid transliteration. The method may also include, for each translation pair that is determined not to be a valid transliteration, determining whether each translation pair candidate is a valid translation. The method may further include adding each translation pair that is determined to be a valid translation or transliteration to a dictionary.
机译:提供了用于从网页识别翻译对的系统和方法。一种公开的方法包括:接收源语言的单语网页数据;通过检测网页数据中预定义模式的出现来处理网页数据;以及提取多个翻译对候选者。每个翻译对候选者可以包括源语言字符串和目标语言字符串。该方法可以进一步包括确定每个翻译对候选者是否是有效的音译。该方法还可包括,对于确定为不是有效音译的每个翻译对,确定每个候选翻译对是否是有效翻译。该方法可以进一步包括将被确定为有效翻译或音译的每个翻译对添加到字典中。

著录项

  • 公开/公告号WO2009035863A2

    专利类型

  • 公开/公告日2009-03-19

    原文格式PDF

  • 申请/专利权人 MICROSOFT CORPORATION;

    申请/专利号WO2008US74672

  • 发明设计人 GAO JIANFENG;

    申请日2008-08-28

  • 分类号G06F17/27;G06F17/28;

  • 国家 WO

  • 入库时间 2022-08-21 19:19:54

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号