首页>
外国专利>
Automatic acquisition of a parallel corpus from a network
Automatic acquisition of a parallel corpus from a network
展开▼
机译:从网络自动获取并行语料库
展开▼
页面导航
摘要
著录项
相似文献
摘要
Network pages are identified based on whether the pages include image alternative text that indicates that the network pages contain links to pages that are translations of each other. A plurality of pages and a plurality of respective uniform resource locators are downloaded from a server associated with the domain name of the identified network pages. The uniform resource locators are used to identify a set of candidate parallel page pairs and a set of features are created for each candidate parallel page pair. The sets of features are used to identify parallel page pairs, wherein the pages in a parallel page pair are translations of each other.
展开▼