首页> 外国专利> Method for a uniform symbolic description of document patterns in the form of data structures in an automated apparatus

Method for a uniform symbolic description of document patterns in the form of data structures in an automated apparatus

机译:在自动化设备中以数据结构形式对文档模式进行统一符号描述的方法

摘要

1. Method for a uniform symbolic description of document patterns in the form of data structures in an automated apparatus, it being possible for the document patterns to be composed of the components "text", "graphics" and/or "image", for the purpose of the division of the document pattern into these components and for the further processing of these components in evaluating modules for special application cases, descriptors representing interpolation nodes along edges in a respective document pattern being extracted line-by-line from the digitized output signals of an optical scanning device, descriptors being calculated in a first module and the data relating to the descriptors being stored in an in each case free storage location of a data memory, whereupon the data memory issues an acknowledgment with information about the storage location used, a second module being activated by the transfer of the returned address information and accepting this address information together with list indices from lists, the second module using this address information to incorporate the descriptor stored in the data memory into an already existing descriptor string with an edge representative or to begin a new descriptor string with the formation of a new edge representative necessary for this, the descriptor string being in each case a concatenation of individual descriptors with the aid of pointers and a new descriptor, which is stored in an arbitrary storage location in the data memory, being connected by means of pointers to the respective already existing string or already existing strings, the position of the associated edge representative(s) being taken directly from the lists with the aid of the list indices transferred, this information being entered in the lists again by the second module when there is a change of representative of the descriptor string, the second module transferring the index of this edge for further processing to a third module whenever a completely closed edge is formed by a descriptor string, the third and following modules concatenating a plurality of edges to form hierarchically organized data structures and classifying these as text components, graphics components and/or image components, an output module making available the index of the representative of a complete data structure which is hierarchically organized in this manner and which contains the entire information stored in the data memory about these representatives including the information about its subordinate representatives, to a more farreaching, evaluating process in evaluating modules, and only the index of the superordinate representative being transferred to one of the evaluating modules and this module processing the information stored therein under the index of the respective representative by its own access to the data memory, as well as for extraction of recognition characteristics from the image description obtained in this manner, in which it is provided - that fine structure elements are formed in a fine structure element module, - that these fine structure elements are combined during the line-by-line processing of the document image to form fine edge strings which reproduce the edge contours of the individual patterns to the pixel exactly, - that the individual sections of these fine edges are associated with the descriptors in such a way that they describe exactly the edge contour between in each case two descriptors, - that for each new scanning line the data relating to the fine structure elements are stored in the form of fine structure substrings describing the pattern edges exactly to the pixel in respectively free memory locations of the data memory and are concatenated with one another in accordance with the contour of the pattern edge, - that a newly formed fine structure substring in an image line is concatenated on the basis of list information also supplied to an already existing fine structure string to be completed and the position of the fine structure element which forms the new open edge end of the fine structure string is reported back in an acknowledgement to the fine structure element module, the fine structure element module noting by means of this acknowledgement the position of the free edge end in the data memory either in additional lists to the lists for the current image line (YA) and the preceding image line (YA1) or in the edge representative superordinate to the fine edge string, as a result of which it is possible to place at the open edge the fine structure substring being produced in the next image line as a continuation of the edge, - that the fine structure elements in each case contain information about the orientation of the respective pixel, a pointer to the next fine structure element and the mid-point coordinates (i, j) of the element in an overlaid fine coordinate system (Fig. 1), - that there is produced in each scanned image line for each sectioned pattern edge one of said fine structure substrings which is concatenated to the open end of the associated fine structure string (Fig. 2, Fig. 3), - that, when it is entered in the descriptor string in the superordinate edge representative, a new descriptor finds the information about the open edge end of the fine structure string associated with it, - that this fine structure string is run through up to the position of the descriptor, - that a pointer pointing to this fine structure element is entered in the descriptor, - that pointers refer to the start and end of the associated fine structure string in the edge representative in the data memory, - that, when two edges flow together, analogous operations to those for the descriptor strings with respect to creation, combination, extension and ending with regard to the fine structure strings, are carried out in the edge module, - that the hierarchically organized data structure preferably of each object to be recognized is made available to the evaluating module for forming recognition characteristics, - that projection vectors (C(alpha, n), t(alpha, n)) known per se are calculated from the fine structure elements of such a data structure, - that the fine structure strings of the object to be recognized are run through, - that the projection vector (C(alpha, n)) is calculated by each individual element from the fine structure string being mapped on an element _n in the projection vector (C(alpha, n)), _n being determined from the i and j coordinates of a fine structure element and the projection angle (alpha), the value "c(alpha, n)" being incremented by 1 for the corresponding element, - that the projection vector (t(alpha, n)) is calculated by the fine structure strings of the object to be recognized being run through, - that, in conjunction with this, the value _n and a value _f are calculated for each fine structure element as a function of the coordinates (i, j) as well as of the projection angle (alpha), the value _f being calculated from a value _q and a value _b by multiplication, and the value _q corresponding to the distance between the fine structure element and a straight line (t(alpha, n)) (Fig. 5), the associated value for each fine structure element being added in the memory cell 2 of "t(alpha, n)" in such a manner that f-values deliver a positive amount on the side of an edge element facing the pattern and deliver a negative amount on the side facing away from the pattern, an f-value being calculated as the product from the distance between the mid-point coordinates of a fine structure element and the projection lines "t(alpha, n)", namely the q-value, and the projection width _b of a fine structure element on the projection line "t(alpha, n)", and - that the recognition characteristics to be extracted are calculated in a manner known per se from the projection vectors (C(alpha, n), t(alpha, n)).
机译:1.一种用于在自动化设备中以数据结构的形式对文件图案进行统一符号描述的方法,其中,文件图案可以由“文本”,“图形”和/或“图像”的组成部分组成。将文档模式划分为这些组件的目的以及在特殊应用案例的评估模块中对这些组件进行进一步处理的目的是,从数字化输出中逐行提取表示各个文档模式中沿边的插值节点的描述符光学扫描设备的信号,在第一个模块中计算出描述符,并且与描述符相关的数据分别存储在数据存储器的空闲存储位置中,然后数据存储器发出有关所用存储位置信息的确认,通过传输返回的地址信息并接受该地址信息以及从列表中列出索引,第二模块使用该地址信息将存储在数据存储器中的描述符合并到具有边沿代表的已经存在的描述符字符串中,或​​者通过为此形成新的边沿代表来开始新的描述符串,所述描述符字符串在每种情况下都是借助于指针和新描述符的串联,该新描述符存储在数据存储器中的任意存储位置中,并通过指针连接到各自已经存在或已经存在的字符串字符串,相关的边缘代表的位置借助于传送的列表索引直接从列表中获取,当描述符字符串的代表发生更改时,第二个模块将这些信息再次输入到列表中,第二个模块在完成后将此边的索引转移到第三个模块,以进一步处理封闭的边缘由描述符字符串形成,第三和随后的模块将多个边缘连接起来以形成层次结构化的数据结构,并将它们分类为文本成分,图形成分和/或图像成分,输出模块使索引的索引可用一个完整的数据结构的代表,以这种方式进行分层组织,并包含存储在数据存储器中的有关这些代表的全部信息(包括有关其下属代表的信息),以便在评估模块中进行更广泛的评估,并且仅包含索引将上级代表转移到评估模块之一,并且该模块通过自身访问数据存储器来处理在相应代表的索引下存储在其中的信息,并从在图2中获得的图像描述中提取识别特征这种方式,它是亲证明-在精细结构元素模块中形成了精细结构元素-在文档图像的逐行处理期间将这些精细结构元素进行组合,以形成精细边缘串,这些细边串将各个图案的边缘轮廓复制为精确地确定像素-这些精细边缘的各个部分与描述符相关联,以使它们分别精确描述两个描述符之间的边缘轮廓-对于每个新的扫描线,与精细结构有关的数据元素以精细结构子串的形式存储,这些子串精确描述了图案边缘,精确到像素分别位于数据存储器的各个空闲存储位置,并根据图案边缘的轮廓相互连接,即一种新形成的精细结构图像行中的子字符串根据列表信息进行串联,列表信息还提供给已存在的要完成的精细结构字符串ed,并在确认时报告形成细微结构弦的新的开放边缘端的细微结构元素的位置,细微结构元素模块通过此确认指出自由位置在当前图像行(YA)和前一图像行(YA1)的列表的其他列表中,或者在比细边串更高级的边缘中,在数据存储器中的边缘端将在下一个图像行中生成的精细结构子字符串作为边缘的延续放置在开放边缘处-每种情况下,精细结构元素都包含有关各个像素方向的信息,即指向下一个精细结构元素的指针以及重叠的精细坐标系中元素的中点坐标(i,j)(图1),-在每个扫描的图像行中,对于每个分段的图案边缘产生一个所述精细结构子串之一,该子串与相关的精细结构串的开口端相连(图2,图3),-在上级边缘代表的描述符字符串中输入,新的描述符找到与之关联的精细结构字符串的开放边缘末端的信息,-该精细结构字符串一直贯穿到描述符的位置,-在描述符中输入了指向该精细结构元素的指针,-指针引用了数据存储器中代表的边沿中关联的精细结构字符串的开始和结束,-当两个边沿一起流动时,类似的操作相对于描述符字符串的创建,组合,扩展和结束有关精细结构字符串的描述,在边缘模块中执行-层次结构优选地将要识别的每个对象的数据结构提供给评估模块以用于形成识别特性,-本身已知的投影矢量(C(alpha,n),t(alpha,n))是从精细结构元素中计算得出的在这样的数据结构中,-穿过要识别的对象的精细结构字符串,-投影矢量(C(alpha,n))由每个单独的元素从映射到对象上的精细结构字符串计算得出投影矢量(C(alpha,n))中的元素_n,_n由精细结构元素的i和j坐标以及投影角度α确定,值“ c(alpha,n)”增加对于相应的元素,在图1中,-投影向量(t(alpha,n))是通过要识别的对象的精细结构串计算而来的,-与此结合,值_n和一个值为每个精细结构元素计算_f作为坐标(i,j)以及投影角度(α),值_f是通过乘以值_q和值_b来计算的,值_q是与精细结构元素和直线之间的距离相对应的线(t(alpha,n))(图5),将每个精细结构元素的关联值以“ f(alpha,n)”的方式添加到存储单元2中,以使f值在面对图样的边缘元素一侧传递正值,并且在背离图案的一侧传递负值,从精细结构元素的中点坐标与投影线“ t(alpha,n)”之间的距离计算出f值作为乘积,即q值和精细结构元素在投​​影线“ t(alpha,n)”上的投影宽度_b,以及-要提取的识别特征以本身已知的方式从投影向量中计算出来( C(alpha,n),t(alpha,n))。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号