首页> 外文会议>IEEE International Conference on Innovations in Intelligent Systems and Applications >Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations
【24h】

Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations

机译:具有k组合和排列的产品标题的有效无监督匹配

获取原文

摘要

The problem of matching product titles is of particular interest for both users and marketers. The former, frequently search the Web with the aim of comparing prices and characteristics, or obtaining and aggregating information provided by other users. The latter, often require wide knowledge of competitive policies, prices and features to organize a promotional campaign about a group of products. To address this interesting problem, recent studies have attempted to enrich the product titles by exploiting Web search engines. More specifically, these methods suggest that for each product title a query should be submitted. After the results have been collected, the most important words which appear in the results are identified and appended in the titles. In the sequel, each word is assigned an importance score and finally, a similarity measure is applied to identify if two or more titles refer to the same product. Nonetheless, these methods have multiple problems including scalability, slow retrieval of the required additional search results, and lack of flexibility. In this paper, we present a different approach which addresses all these issues and is based on the morphological analysis of the titles of the products. In particular, our method operates in two phases. In the first phase, we compute the combinations of the words of the titles and we record several statistics such as word proximity and frequency values. In the second phase, we use this information to assign a score to each combination. The highest scoring combination is then declared as label of the cluster which contains each product. The experimental evaluation of the algorithm, in a real world dataset, demonstrated that compared to three popular string similarity metrics, our approach achieves up to 36% better matching performance and at least 13 times faster execution.
机译:产品名称匹配的问题对于用户和营销人员都是特别重要的。前者经常在Web上搜索,目的是比较价格和特性,或者获取和汇总其他用户提供的信息。后者通常需要对竞争政策,价格和功能有广泛的了解,才能组织有关一组产品的促销活动。为了解决这个有趣的问题,最近的研究试图通过利用Web搜索引擎来丰富产品名称。更具体地说,这些方法建议应针对每个产品标题提交查询。收集结果后,将识别结果中出现的最重要的单词并将其附加在标题中。在续集中,每个单词都分配了一个重要度分数,最后,采用相似性度量来识别两个或多个标题是否引用同一产品。但是,这些方法存在多个问题,包括可伸缩性,所需附加搜索结果的检索速度慢以及缺乏灵活性。在本文中,我们基于产品标题的形态分析,提出了一种解决所有这些问题的不同方法。特别地,我们的方法分两个阶段进行。在第一阶段,我们计算标题词的组合,并记录一些统计信息,例如词接近度和频率值。在第二阶段,我们使用此信息为每个组合分配分数。然后将得分最高的组合声明为包含每个产品的集群的标签。在现实世界的数据集中对该算法进行的实验评估表明,与三种流行的字符串相似性指标相比,我们的方法可将匹配性能提高多达36%,执行速度至少快13倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号