首页> 外文会议>Foundations of intelligent systems >Extracting Product Descriptions from Polish E-Commerce Websites Using Classification and Clustering
【24h】

Extracting Product Descriptions from Polish E-Commerce Websites Using Classification and Clustering

机译:使用分类和聚类从波兰电子商务网站中提取产品描述

获取原文
获取原文并翻译 | 示例

摘要

A novel method for extracting product descriptions from e-commerce websites is presented. The algorithm consists of three major steps: (1) extracting descriptions of appropriate length from the source documents related to the search query using shallow text analysis methods; (2) assigning each of the description to one of the predefined categories by means of text classification and (3) grouping the results by a text clustering algorithm to return the descriptions found in the clusters with the highest quality. The recall and precision of the search are examined using a set of queries for laptops currently being sold in popular shopping sites. It is shown that, although the extraction method based purely on the classification and the method based purely on the clustering give acceptable results, the highest precision is achieved when using them together. It was also observed that examining about 20 first sites returned by Google is sufficient to get high quality descriptions of popular products.
机译:提出了一种从电子商务网站提取产品描述的新颖方法。该算法包括三个主要步骤:(1)使用浅层文本分析方法从与搜索查询相关的源文档中提取适当长度的描述; (2)通过文本分类将每个描述分配给预定义的类别之一,以及(3)通过文本聚类算法将结果分组,以返回在质量最高的聚类中找到的描述。使用针对当前在流行购物网站上出售的笔记本电脑的一组查询来检查搜索的召回率和准确性。结果表明,尽管仅基于分类的提取方法和仅基于聚类的方法给出了可接受的结果,但将它们一起使用时可以实现最高的精度。还观察到,检查Google返回的大约20个首个站点足以获得对热门产品的高质量描述。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号