首页> 外国专利> TEXT EXTRACTION METHOD, TEXT EXTRACTION DEVICE AND TEXT EXTRACTION PROGRAM

TEXT EXTRACTION METHOD, TEXT EXTRACTION DEVICE AND TEXT EXTRACTION PROGRAM

机译:文本提取方法,文本提取设备和文本提取程序

摘要

PROBLEM TO BE SOLVED: To highly accurately extract a text part from a document expressed in a tree structure.SOLUTION: A text part determination function section 8 classifies feature information extracted for each node of an input document according to storage data of a database 4, determines whether each node is a text part or not, and stores a determination result in a storage section 9. A boundary acquisition function section 10 acquires the determination result by referring a storage section 9, then successively retrieves boundaries of the text part by tracing from lower nodes to upper nodes in the tree structure of the input document according to the determination result for the lower nodes, and store a retrieval result in a storage section 11. A text extraction function section 12 extracts character strings under the boundaries as the text part by referring the storage section 11.
机译:解决的问题:为了从树结构表示的文档中高精度地提取文本部分。解决方案:文本部分确定功能部分8根据数据库4的存储数据对为输入文档的每个节点提取的特征信息进行分类,4确定每个节点是否是文本部分,并将确定结果存储在存储部分9中。边界获取​​功能部分10通过参考存储部分9获取确定结果,然后通过从以下位置跟踪来连续检索文本部分的边界根据对下级节点的确定结果,将输入文档的树形结构中的下级节点到上级节点,并将检索结果存储在存储部分11中。文本提取功能部分12提取边界下的字符串作为文本部分。通过参考存储部分11。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号