首页> 外国专利> IDENTIFYING SOURCE DATASETS THAT FIT A TRANSFER LEARNING PROCESS FOR A TARGET DOMAIN

IDENTIFYING SOURCE DATASETS THAT FIT A TRANSFER LEARNING PROCESS FOR A TARGET DOMAIN

机译：识别适合目标域的传输学习过程的源数据集

页面导航

摘要
著录项
相似文献

摘要

A method for quantifying a similarity between a target dataset and multiple source datasets and identifying one or more source datasets that are most similar to the target dataset is provided. The method includes receiving, at a computing system, source datasets relating to a source domain and a target dataset relating to a target domain of interest. Each dataset is arranged in a tabular format including columns and rows, and the source datasets and the target dataset include a same feature space. The method also includes pre-processing, via a processor of the computing system, each source-target dataset pair to remove non-intersecting columns. The method further includes calculating at least two of a dataset similarity score, a row similarity score, and a column similarity score for each source-target dataset pair, and summarizing the calculated similarity scores to identify one or more source datasets that are most similar to the target dataset.

机译：提供了一种用于量化目标数据集和多个源数据集之间的相似性并识别与目标数据集最相似的一个或多个源数据集之间的相似性。该方法包括在计算系统中接收与源域的源数据集和与目标域有关的目标数据集。每个数据集以表格格式排列，包括列和行，源数据集和目标数据集包括相同的特征空间。该方法还包括通过计算系统的处理器预处理，每个源目标数据集对以删除非交叉列。该方法还包括计算数据集相似度分数，行相似度分数的至少两个，以及每个源目标数据集对的列相似度分数，并概括计算出的相似性分数以识别最相似的一个或多个源数据集目标数据集。

著录项

公开/公告号US2022027339A1

专利类型
公开/公告日2022-01-27

原文格式PDF
申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORPORATION;
展开▼

申请/专利号US202016934492
发明设计人 BAR HAIM;ANDREY FINKELSHTEIN;EITAN MENAHEM;NOGA AGMON;
展开▼

申请日2020-07-21
分类号G06F16/23;G06F16/22;G06N20;G06K9/62;
国家 US
入库时间 2022-08-24 23:33:12

相似文献

专利
外文文献
中文文献