首页>
外国专利>
Avoiding Confusion Arising from Similar Anchor Expressions
Avoiding Confusion Arising from Similar Anchor Expressions
展开▼
机译:避免类似的锚表达式引起的混乱
展开▼
页面导航
摘要
著录项
相似文献
摘要
A page image is divided into areas, and attributes added for identification (e.g. as text, photo, drawing). Character recognition is performed on a caption and a body text area to which caption and body text attributes are added, respectively. Metadata is associated with an object area accompanied by the caption area, comprising: extracting an anchor expression (AE) composed of a character string and a caption expression composed of a character string other than the AE from the caption area character recognition, determining whether there are a plurality of object areas accompanied by caption areas including an identical AE, extracting an explanatory text including the AE from the body text area character recognition result, associating the object area with metadata obtained from the extracted explanatory text if there is one object area accompanied by a caption area including the identical AE, calculating similarity degrees between caption expressions of the respective caption areas including the identical AE and the explanatory text including the identical AE respectively if there are object areas accompanied by caption areas including the identical AE, and determining optimal explanatory texts for the object areas and to associate metadata obtained from the determined optimal explanatory texts with the respective object areas based on the calculated similarity degrees.
展开▼