首页>
外国专利>
Aligning hierarchial and sequential document trees to identify parallel data
Aligning hierarchial and sequential document trees to identify parallel data
展开▼
机译:对齐分层和顺序文档树以标识并行数据
展开▼
页面导航
摘要
著录项
相似文献
摘要
A set of candidate parallel pages is identified based on trigger words in one or more pages downloaded from a given network location (such as a website). A set of document trees representing each of the candidate pages are aligned to identify translationally parallel content and hyperlinks. The parallel content is further fed into conventional sentence aligner for parallel sentences. And the parallel hyperlinks usually refer to other parallel documents, and lead to a recursive mining of parallel documents.
展开▼