首页> 外文会议>Distributed computing and artificial intelligence >Information Extraction from Heterogeneous Web Sites Using Clue Complement Process Based on a User's Instantiated Example
【24h】

Information Extraction from Heterogeneous Web Sites Using Clue Complement Process Based on a User's Instantiated Example

机译:基于用户实例化的线索补充过程从异构网站中提取信息

获取原文
获取原文并翻译 | 示例

摘要

Since the growth of the Internet, World Wide Web has become significant infrastructure in various fields such as business, commerce, education and so on. Accordingly, a user has gathered information by using the Internet. However due to increasing Web pages, it becomes difficult for a user to collect desirable information. Advanced Web search engines may provide solution to some extent, it is still up to a user to summarize or extract meaningful information from such retrieval results. Based on this viewpoints, this paper addresses a generation method of table-style data from heterogeneous Web pages that reflects a user's intention. To achieve it, the method utilize a user's instantiated example in a table in addition to column labels as the table. Based on a user's instantiated example, meaningful information are extracted using pattern matching and N-gram method. We apply this method to 57 pages with 27 travel agencies whether the proposed method is effective or not. As the result, 88% was precision rate and 68% was recall rate.
机译:随着Internet的发展,万维网已成为商业,商务,教育等各个领域的重要基础架构。因此,用户已经通过使用互联网来收集信息。但是,由于网页的增加,用户难以收集期望的信息。先进的Web搜索引擎可以在某种程度上提供解决方案,但用户仍然需要从此类检索结果中总结或提取有意义的信息。基于这种观点,本文提出了一种从异构Web页面生成反映用户意图的表格样式数据的方法。为了实现它,该方法除了将列标签用作表之外,还利用了表中用户实例化的示例。基于用户的实例化示例,使用模式匹配和N-gram方法提取有意义的信息。无论提议的方法是否有效,我们都会将此方法应用于27个旅行社的57页页面。结果,准确率是88%,召回率是68%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号