Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations

机译：有效无监督的产品标题与K-Combinations和置换的匹配

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of matching product titles is of particular interest for both users and marketers. The former, frequently search the Web with the aim of comparing prices and characteristics, or obtaining and aggregating information provided by other users. The latter, often require wide knowledge of competitive policies, prices and features to organize a promotional campaign about a group of products. To address this interesting problem, recent studies have attempted to enrich the product titles by exploiting Web search engines. More specifically, these methods suggest that for each product title a query should be submitted. After the results have been collected, the most important words which appear in the results are identified and appended in the titles. In the sequel, each word is assigned an importance score and finally, a similarity measure is applied to identify if two or more titles refer to the same product. Nonetheless, these methods have multiple problems including scalability, slow retrieval of the required additional search results, and lack of flexibility. In this paper, we present a different approach which addresses all these issues and is based on the morphological analysis of the titles of the products. In particular, our method operates in two phases. In the first phase, we compute the combinations of the words of the titles and we record several statistics such as word proximity and frequency values. In the second phase, we use this information to assign a score to each combination. The highest scoring combination is then declared as label of the cluster which contains each product. The experimental evaluation of the algorithm, in a real world dataset, demonstrated that compared to three popular string similarity metrics, our approach achieves up to 36% better matching performance and at least 13 times faster execution.

机译：匹配产品标题的问题对于用户和营销人员来说是特别令人兴趣的。前者经常搜索网络，目的是比较价格和特征，或获得和聚合其他用户提供的信息。后者，通常需要广泛了解竞争政策，价格和特征，以组织关于一组产品的促销活动。为了解决这个有趣的问题，最近的研究已经尝试通过利用网络搜索引擎来丰富产品标题。更具体地说，这些方法表明，对于每个产品标题，应提交查询。在收集结果之后，在标题中识别并附加结果中出现的最重要的单词。在续集中，每个单词被分配了一个重要性分数，最后，应用相似度测量以识别两个或多个标题是否参考相同的产品。尽管如此，这些方法具有多种问题，包括可扩展性，缓慢检索所需的额外搜索结果，以及缺乏灵活性。在本文中，我们提出了一种解决所有这些问题的不同方法，并基于产品标题的形态分析。特别是，我们的方法以两个阶段运行。在第一阶段，我们计算标题的单词的组合，并且我们记录多个统计数据，例如单词接近和频率值。在第二阶段，我们使用此信息为每个组合分配分数。然后，最高评分组合被声明为包含每个产品的群集的标签。算法的实验评估，在真实的世界数据集中展示了与三个流行的字符串相似度指标相比，我们的方法可以实现最多36 ％的匹配性能，并且执行至少13倍。

著录项

来源
《IEEE International Conference on Innovations in Intelligent Systems and Applications》|2018年|1 v.|共10页
会议地点
作者
Leonidas Akritidis; Panayiotis Bozanis;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词
Measurement; Feeds; Web search; Engines; Clustering algorithms; Indexes; Scalability;

机译：测量;饲料;网页搜索;引擎;聚类算法;索引;可扩展性;

相似文献

外文文献
中文文献
专利

1. A self-verifying clustering approach to unsupervised matching of product titles [J] . Akritidis Leonidas, Fevgas Athanasios, Bozanis Panayiotis, Artificial Intelligence Review: An International Science and Engineering Journal . 2020,第7期

机译：自验证群集匹配产品标题的无监督匹配方法
2. Matching Seqlets: An Unsupervised Approach for Locality Preserving Sequence Matching [J] . Qiu Jiayan, Wang Xinchao, Fua Pascal, IEEE Transactions on Pattern Analysis and Machine Intelligence . 2021,第2期

机译：匹配SEQLETS：一种无监督的位置保存序列匹配方法
3. Unsupervised group matching with application to cross-lingual topic matching without alignment information [J] . Iwata Tomoharu, Kanagawa Motonobu, Hirao Tsutomu, Data mining and knowledge discovery . 2017,第2期

机译：无监督的组与应用程序匹配，在没有对齐信息的情况下匹配的跨语言主题
4. Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations [C] . Leonidas Akritidis, Panayiotis Bozanis IEEE International Conference on Innovations in Intelligent Systems and Applications . 2018

机译：具有k组合和排列的产品标题的有效无监督匹配
5. Automatic Detection of Section Title and Prose Text in HTML Documents Using Unsupervised and Supervised Learning [D] . Mysore Gopinath, Abhijith Athreya 2018

机译：使用无监督和有监督的学习自动检测HTML文档中的节标题和散文
6. County-level phenomapping to identify disparities in cardiovascular outcomes: An unsupervised clustering analysis: Short title: Unsupervised clustering of counties and risk of cardiovascular mortality [O] . Matthew W. Segar, Shreya Rao, Ann Marie Navar, 2020

机译：县级现象以识别心血管成果的差异：无监督的聚类分析：简称：无监督的聚类和心血管死亡率的风险
7. PERMUTATION STATISTICS OF PRODUCTS OF RANDOM PERMUTATIONS [O] . Axel Hultman 2016

机译：随机置换产品的置换统计

Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations

摘要

著录项

相似文献

相关主题

期刊订阅