首页> 外文会议>IEEE International Conference on Innovations in Intelligent Systems and Applications >Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations
【24h】

Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations

机译:有效无监督的产品标题与K-Combinations和置换的匹配

获取原文

摘要

The problem of matching product titles is of particular interest for both users and marketers. The former, frequently search the Web with the aim of comparing prices and characteristics, or obtaining and aggregating information provided by other users. The latter, often require wide knowledge of competitive policies, prices and features to organize a promotional campaign about a group of products. To address this interesting problem, recent studies have attempted to enrich the product titles by exploiting Web search engines. More specifically, these methods suggest that for each product title a query should be submitted. After the results have been collected, the most important words which appear in the results are identified and appended in the titles. In the sequel, each word is assigned an importance score and finally, a similarity measure is applied to identify if two or more titles refer to the same product. Nonetheless, these methods have multiple problems including scalability, slow retrieval of the required additional search results, and lack of flexibility. In this paper, we present a different approach which addresses all these issues and is based on the morphological analysis of the titles of the products. In particular, our method operates in two phases. In the first phase, we compute the combinations of the words of the titles and we record several statistics such as word proximity and frequency values. In the second phase, we use this information to assign a score to each combination. The highest scoring combination is then declared as label of the cluster which contains each product. The experimental evaluation of the algorithm, in a real world dataset, demonstrated that compared to three popular string similarity metrics, our approach achieves up to 36% better matching performance and at least 13 times faster execution.
机译:匹配产品标题的问题对于用户和营销人员来说是特别令人兴趣的。前者经常搜索网络,目的是比较价格和特征,或获得和聚合其他用户提供的信息。后者,通常需要广泛了解竞争政策,价格和特征,以组织关于一组产品的促销活动。为了解决这个有趣的问题,最近的研究已经尝试通过利用网络搜索引擎来丰富产品标题。更具体地说,这些方法表明,对于每个产品标题,应提交查询。在收集结果之后,在标题中识别并附加结果中出现的最重要的单词。在续集中,每个单词被分配了一个重要性分数,最后,应用相似度测量以识别两个或多个标题是否参考相同的产品。尽管如此,这些方法具有多种问题,包括可扩展性,缓慢检索所需的额外搜索结果,以及缺乏灵活性。在本文中,我们提出了一种解决所有这些问题的不同方法,并基于产品标题的形态分析。特别是,我们的方法以两个阶段运行。在第一阶段,我们计算标题的单词的组合,并且我们记录多个统计数据,例如单词接近和频率值。在第二阶段,我们使用此信息为每个组合分配分数。然后,最高评分组合被声明为包含每个产品的群集的标签。算法的实验评估,在真实的世界数据集中展示了与三个流行的字符串相似度指标相比,我们的方法可以实现最多36 %的匹配性能,并且执行至少13倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号