首页> 外国专利> Finding of titles in the scanned document images, and photo

Finding of titles in the scanned document images, and photo

机译:在扫描的文档图像和照片中查找标题

摘要

The bitmap image data is analyzed by connected component extraction to identify components or connected components that represent either individual characters or letters, or regions of a nontext image. The connected components are classified as text or nontext based on geometric attributes such as the number of holes, arcs and line ends comprising each component. A nearest-neighbor analysis then identifies which text components represent lines or strings of text and each line or string is further analyzed to determine its vertical or horizontal orientation. Thereafter, separate vertical and horizontal font height filters are used to identify those text strings that are the most likely candidates. For the most likely title candidates a bounding box is defined which can be associated with or overlaid upon the original bitmap data to select the title region for further processing or display. Captions and photographs can also be located.
机译:通过连接的组件提取来分析位图图像数据,以识别代表单个字符或字母或非文本图像区域的组件或连接的组件。根据几何属性(例如组成每个组件的孔,弧和线端的数量),将连接的组件分为文本或非文本。然后,最近邻分析确定哪些文本成分代表文本的行或字符串,并且进一步分析每个行或字符串以确定其垂直或水平方向。此后,使用单独的垂直和水平字体高度过滤器来标识最可能的那些文本字符串。对于最可能的标题候选者,定义了可以与原始位图数据相关联或覆盖在原始位图数据上的边界框,以选择标题区域以进行进一步处理或显示。字幕和照片也可以找到。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号