首页> 外文会议>Proceedings of the 4th ACM international conference on web search and data mining. >Normalizing Web Product Attributes and Discovering Domain Ontology with Minimal Effort *
【24h】

Normalizing Web Product Attributes and Discovering Domain Ontology with Minimal Effort *

机译:标准化Web产品属性并以最小的努力发现域本体*

获取原文
获取原文并翻译 | 示例

摘要

We have developed a framework aiming at normalizing product attributes from Web pages collected frora different Web sites without the need of labeled training examples. It; can deal with pages composed of different layout format and content in an unsupervised manner. As a result, it can handle a variety of different domains with minimal effort. Our model is based on a generative probabilistic graphical model incorporated with Hidden Markov Models (HMM) considering both attribute names and attribute values to extract and normalize text fragments from Web pages in a unified manner. Dirichlet Process is employed to handle the unlimited number of attributes in a domain. An unsupervised inference method is proposed to predict the unobserv-able variables. We have also developed a method to automatically construct a domain ontology using the normalized product attributes which are the output of the inference on the graphical model. We have conducted extensive experiments and compared with existing works using product Web pages collected from real-world Web sites in three different domains to demonstrate the effectiveness of our framework.
机译:我们已经开发了一个框架,旨在标准化从不同网站收集的网页中的产品属性,而无需带有标签的培训示例。它;可以以无监督的方式处理由不同布局格式和内容组成的页面。结果,它可以以最小的努力处理各种不同的领域。我们的模型基于结合概率马尔可夫模型(HMM)的生成概率图形模型,同时考虑了属性名称和属性值以统一方式从网页中提取和规范化文本片段。 Dirichlet Process用于处理域中无限数量的属性。提出了一种无监督推理方法来预测不可观测变量。我们还开发了一种使用归一化乘积属性自动构建域本体的方法,归一化乘积属性是图形模型上推断的输出。我们进行了广泛的实验,并使用从三个不同领域的真实网站收集的产品网页与现有作品进行了比较,以证明我们框架的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号