首页> 外文期刊>Applied Artificial Intelligence >CRAWLING THE CONSTRUCTION WEB - A MACHINE-LEARNING APPROACH WITHOUT NEGATIVE EXAMPLES
【24h】

CRAWLING THE CONSTRUCTION WEB - A MACHINE-LEARNING APPROACH WITHOUT NEGATIVE EXAMPLES

机译:爬网构建-一种没有负面例子的机器学习方法

获取原文
获取原文并翻译 | 示例
       

摘要

Professionals and craftsmen in the construction sector make an intensive use of information in their decision-making processes but only make limited use of the abundant information that is potentially available to them, particularly on the web. Consequently, designs are impoverished, construction is defective, and innovation is delayed. To facilitate convivial access to focused information, we have developed a question-arid-answer (Q-A) system (reported elsewhere). To support this system, rue have developed an automated crawler that permits the establishment of a bank of relevant pages, adapted to the needs of this particular industry-user community. It is based on the machine-learning framework in which an intelligent decision unit is trained to distinguish between nontopic and informative pages. We show that standard approaches which use both positive and negative classes are sensitive to the noise in the negative class. We propose different techniques for learning without negative examples, since initially one only has limited, positive information labeled by human experts; they are evaluated. Our crawler that uses the positive examples-based learning (PEBL) framework is able to collect construction-oriented pages with high precision and discovery rate. It can also be used to build domain-specific collections of pages in different scientific or professional contexts.
机译:建筑行业的专业人士和手工艺者在决策过程中大量使用信息,但仅有限地利用了他们可能可获得的大量信息,尤其是在网络上。结果,设计变得贫困,构造有缺陷并且创新被延迟。为了方便人们以欢乐的方式获取重点信息,我们开发了一个“问-答”(Q-A)系统(在其他地方进行了报道)。为了支持该系统,rue开发了一种自动爬网程序,该爬网程序允许建立一组相关页面,以适应特定行业用户社区的需求。它基于机器学习框架,在该框架中,训练了智能决策单元以区分非主题页面和信息页面。我们表明,同时使用肯定和否定类别的标准方法对否定类别中的噪声敏感。我们提出了不同的学习方法,没有负面的例子,因为最初只有一个有限的正面信息被人类专家标记;他们被评估。我们的爬虫程序使用基于示例的积极学习(PEBL)框架,能够以高精度和发现率收集面向构造的页面。它也可以用于在不同的科学或专业环境中构建特定于领域的页面集合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号