首页> 外文会议>Proceedings of the 2011 ACM symposium on document engineering. >Developer-Friendly Annotation-Based HTML-to-XML Transformation Technology
【24h】

Developer-Friendly Annotation-Based HTML-to-XML Transformation Technology

机译:基于开发人员友好注释的HTML到XML转换技术

获取原文
获取原文并翻译 | 示例

摘要

Nowadays, the amount of information accessible on the web is huge. Although web users today expect a more integrated way to access information on the web, it is still rather difficult to "integrate" information from different web sites since most web pages are authored in HTML format, which is actually a presentation-oriented language and is usually considered unstructured. Today, there are many research works aiming at extracting information from web pages. Existing works typically transform the extracting results into structured or semi-structured data formats, thus other applications can further process the results to discover more useful information. Nevertheless, the unstructured nature of HTML makes the transformation process complex and can hardly be widely adopted. In this paper, an annotation-based HTML-to-XML transformation technology is proposed. The mechanism is developed with both usability and simplicity in mind. With the proposed mechanism, ordinary web site developers simply add annotations to their web pages. Annotated web pages can then be processed by our software libraries and transformed into XML documents, which are machine-understandable. Software agents thus can be developed based on our technology.
机译:如今,网络上可访问的信息量巨大。尽管当今的网络用户期望以更集成的方式访问网络上的信息,但是由于大多数网页都是以HTML格式编写的,因此实际上很难集成来自不同网站的信息,HTML格式实际上是一种面向演示的语言,通常被认为是非结构化的。如今,有许多旨在从网页中提取信息的研究工作。现有作品通常会将提取结果转换为结构化或半结构化的数据格式,因此其他应用程序可以进一步处理结果以发现更多有用的信息。但是,HTML的非结构化性质使转换过程变得复杂,几乎不能被广泛采用。本文提出了一种基于注释的HTML到XML转换技术。开发该机制时要兼顾可用性和简便性。使用建议的机制,普通网站开发人员只需在其网页上添加注释。带注释的网页然后可以由我们的软件库处理,然后转换成机器可理解的XML文档。因此,可以基于我们的技术来开发软件代理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号