【24h】

Information Extraction Using XPath

机译:使用XPath提取信息

获取原文

摘要

To improve the classification accuracy of documents, it will be important to characterize not only words but also their relations among words. The classification method from this point of view will need another approach for the analysis of documents. In this paper, first, how to find the pattern tree in the XML data tree as the embedded sub-tree is developed simply by applying XPath technique. This problem is applicable to the search of the characterized words and their relations in the XML documents. Second, next problem is what kind of words and their relations exist in the XML documents. This problem is how to find the most frequent patterns in the documents, which is called often the most frequent sub-trees in the XML domain. The second problem finding the most frequent sub-trees is solved simply here by applying XPath technique.
机译:为了提高文档的分类准确度,不仅要表征单词,而且还要表征单词之间的关系,这一点很重要。从这个角度来看,分类方法将需要另一种方法来分析文件。本文首先通过使用XPath技术简单地开发了如何在XML数据树中找到模式树作为嵌入式子树。此问题适用于在XML文档中搜索特征词及其关系。其次,下一个问题是XML文档中存在什么样的单词及其关系。问题是如何在文档中找到最频繁的模式,这在XML域中通常被称为最频繁的子树。通过使用XPath技术,这里可以简单地解决找到最频繁的子树的第二个问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号