首页> 外国专利> Extensible database framework for management of unstructured and semi-structured documents

Extensible database framework for management of unstructured and semi-structured documents

机译:用于管理非结构化和半结构化文档的可扩展数据库框架

摘要

Method and system for querying a collection of Unstructured or semi-structured documents to identify presence of, and provide context and/or content for, keywords and/or keyphrases. The documents are analyzed and assigned a node structure, including an ordered sequence of mutually exclusive node segments or strings. Each node has an associated set of at least four, five or six attributes with node information and can represent a format marker or text, with the last node in any node segment usually being a text node. A keyword (or keyphrase) is specified, and the last node in each node segment is searched for a match with the keyword. When a match is found at a query node, or at a node determined with reference to a query node, the system displays the context and/or the content of the query node.
机译:用于查询非结构化或半结构化文档的集合以识别关键字和/或关键词短语的存在并为其提供上下文和/或内容的方法和系统。对文档进行分析并为其分配节点结构,包括相互排斥的节点段或字符串的有序序列。每个节点具有至少四个,五个或六个具有节点信息的属性的关联集,并且可以表示格式标记或文本,而任何节点段中的最后一个节点通常是文本节点。指定了关键字(或关键字),并在每个节点段的最后一个节点中搜索与关键字的匹配项。当在查询节点或参考查询节点确定的节点找到匹配项时,系统显示查询节点的上下文和/或内容。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号