首页> 外文期刊>Advances in Networks >Extracting Textual Information from Google Using Wrapper Class
【24h】

Extracting Textual Information from Google Using Wrapper Class

机译:使用包装程序类从Google提取文本信息

获取原文
           

摘要

In general, the web text documents are often structured, un-structured, or semi-structured format that is promptly growing everyday with massive amounts of data. The users provided with many tools for searching relevant information. Some of the searches include, Keyword searching, topic and subject browsing can help users to find relevant information quickly. In addition, Index search mechanisms allow the user to retrieve a set of relevant documents. Occasionally these search mechanisms are not sufficient. With the rapid development of Internet, amount of data available on the web regularly increased, which makes it difficult for humans to distinguish relevant information. A wrapper class is proposed to extract the relevant text information and focus on finding useful facts of knowledge from unstructured web documents using Google. Techniques from information retrieval (IR), information extraction (IE), and pattern recognition are explored.
机译:通常,Web文本文档通常是结构化,非结构化或半结构化的格式,并且每天都会随着大量数据的增长而迅速增长。用户提供了许多用于搜索相关信息的工具。其中一些搜索包括关键字搜索,主题和主题浏览,可以帮助用户快速找到相关信息。另外,索引搜索机制允许用户检索一组相关文档。有时,这些搜索机制还不够。随着Internet的快速发展,Web上可用的数据量定期增加,这使得人们很难区分相关信息。提出了一个包装器类,以提取相关的文本信息,并专注于使用Google从非结构化Web文档中找到有用的知识事实。探索了来自信息检索(IR),信息提取(IE)和模式识别的技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号