Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations

机译：具有k组合和排列的产品标题的有效无监督匹配

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of matching product titles is of particular interest for both users and marketers. The former, frequently search the Web with the aim of comparing prices and characteristics, or obtaining and aggregating information provided by other users. The latter, often require wide knowledge of competitive policies, prices and features to organize a promotional campaign about a group of products. To address this interesting problem, recent studies have attempted to enrich the product titles by exploiting Web search engines. More specifically, these methods suggest that for each product title a query should be submitted. After the results have been collected, the most important words which appear in the results are identified and appended in the titles. In the sequel, each word is assigned an importance score and finally, a similarity measure is applied to identify if two or more titles refer to the same product. Nonetheless, these methods have multiple problems including scalability, slow retrieval of the required additional search results, and lack of flexibility. In this paper, we present a different approach which addresses all these issues and is based on the morphological analysis of the titles of the products. In particular, our method operates in two phases. In the first phase, we compute the combinations of the words of the titles and we record several statistics such as word proximity and frequency values. In the second phase, we use this information to assign a score to each combination. The highest scoring combination is then declared as label of the cluster which contains each product. The experimental evaluation of the algorithm, in a real world dataset, demonstrated that compared to three popular string similarity metrics, our approach achieves up to 36% better matching performance and at least 13 times faster execution.

机译：产品名称匹配的问题对于用户和营销人员都是特别重要的。前者经常在Web上搜索，目的是比较价格和特性，或者获取和汇总其他用户提供的信息。后者通常需要对竞争政策，价格和功能有广泛的了解，才能组织有关一组产品的促销活动。为了解决这个有趣的问题，最近的研究试图通过利用Web搜索引擎来丰富产品名称。更具体地说，这些方法建议应针对每个产品标题提交查询。收集结果后，将识别结果中出现的最重要的单词并将其附加在标题中。在续集中，每个单词都分配了一个重要度分数，最后，采用相似性度量来识别两个或多个标题是否引用同一产品。但是，这些方法存在多个问题，包括可伸缩性，所需附加搜索结果的检索速度慢以及缺乏灵活性。在本文中，我们基于产品标题的形态分析，提出了一种解决所有这些问题的不同方法。特别地，我们的方法分两个阶段进行。在第一阶段，我们计算标题词的组合，并记录一些统计信息，例如词接近度和频率值。在第二阶段，我们使用此信息为每个组合分配分数。然后将得分最高的组合声明为包含每个产品的集群的标签。在现实世界的数据集中对该算法进行的实验评估表明，与三种流行的字符串相似性指标相比，我们的方法可将匹配性能提高多达36％，执行速度至少快13倍。

著录项

来源
《IEEE International Conference on Innovations in Intelligent Systems and Applications》|2018年|1-10|共10页
会议地点 Thessaloniki(GR)
作者
Leonidas Akritidis; Panayiotis Bozanis;
展开▼
作者单位

Data Structuring Engineering Lab University of Thessaly Volos Greece;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Measurement; Feeds; Web search; Engines; Clustering algorithms; Indexes; Scalability;

机译：测量;饲料；网络搜索;引擎;聚类算法；索引；可扩展性;

相似文献

外文文献
中文文献
专利

1. A self-verifying clustering approach to unsupervised matching of product titles [J] . Akritidis Leonidas, Fevgas Athanasios, Bozanis Panayiotis, Artificial Intelligence Review: An International Science and Engineering Journal . 2020,第7期

机译：自验证群集匹配产品标题的无监督匹配方法
2. Matching Seqlets: An Unsupervised Approach for Locality Preserving Sequence Matching [J] . Qiu Jiayan, Wang Xinchao, Fua Pascal, IEEE Transactions on Pattern Analysis and Machine Intelligence . 2021,第2期

机译：匹配SEQLETS：一种无监督的位置保存序列匹配方法
3. Unsupervised group matching with application to cross-lingual topic matching without alignment information [J] . Iwata Tomoharu, Kanagawa Motonobu, Hirao Tsutomu, Data mining and knowledge discovery . 2017,第2期

机译：无监督的组与应用程序匹配，在没有对齐信息的情况下匹配的跨语言主题
4. Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations [C] . Leonidas Akritidis, Panayiotis Bozanis IEEE International Conference on Innovations in Intelligent Systems and Applications . 2018

机译：有效无监督的产品标题与K-Combinations和置换的匹配
5. Automatic Detection of Section Title and Prose Text in HTML Documents Using Unsupervised and Supervised Learning [D] . Mysore Gopinath, Abhijith Athreya 2018

机译：使用无监督和有监督的学习自动检测HTML文档中的节标题和散文
6. County-level phenomapping to identify disparities in cardiovascular outcomes: An unsupervised clustering analysis: Short title: Unsupervised clustering of counties and risk of cardiovascular mortality [O] . Matthew W. Segar, Shreya Rao, Ann Marie Navar, 2020

机译：县级现象以识别心血管成果的差异：无监督的聚类分析：简称：无监督的聚类和心血管死亡率的风险
7. PERMUTATION STATISTICS OF PRODUCTS OF RANDOM PERMUTATIONS [O] . Axel Hultman 2016

机译：随机置换产品的置换统计

Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations

摘要

著录项

相似文献

相关主题

期刊订阅