...
【24h】

Algorithms for Storytelling

机译:讲故事的算法

获取原文
获取原文并翻译 | 示例
           

摘要

We formulate a new data mining problem called storytelling as a generalization of redescription mining. In traditional redescription mining, we are given a set of objects and a collection of subsets defined over these objects. The goal is to view the set system as a vocabulary and identify two expressions in this vocabulary that induce the same set of objects. Storytelling, on the other hand, aims to explicitly relate object sets that are disjoint (and hence, maximally dissimilar) by finding a chain of (approximate) redescriptions between the sets. This problem finds applications in bioinformatics, for instance, where the biologist is trying to relate a set of genes expressed in one experiment to another set, implicated in a different pathway. We outline an efficient storytelling implementation that embeds the CARTwheels redescription mining algorithm in an A* search procedure, using the former to supply next move operators on search branches to the latter. This approach is practical and effective for mining large datasets and, at the same time, exploits the structure of partitions imposed by the given vocabulary. Three application case studies are presented: a study of word overlaps in large English dictionaries, exploring connections between genesets in a bioinformatics dataset, and relating publications in the PubMed index of abstracts.
机译:我们制定了一个新的数据挖掘问题,称为讲故事,将其作为重新定义挖掘的概括。在传统的重新定义挖掘中,我们得到了一组对象以及在这些对象上定义的子集的集合。目标是将集合系统视为一个词汇表,并在该词汇表中识别出诱发相同对象集的两个表达式。另一方面,讲故事的目的是通过在对象集之间找到(近似)重述链来明确关联不相交的对象集(因此,最大程度不同)。这个问题在生物信息学中得到了应用,例如,生物学家试图将一个实验中表达的一组基因与另一组涉及另一种途径的基因相关联。我们概述了一种有效的讲故事的实现方式,该实现将CARTwheels重新描述挖掘算法嵌入A *搜索过程中,使用前者向后者的搜索分支提供下一步操作符。这种方法对于挖掘大型数据集既实用又有效,并且同时利用了给定词汇所强加的分区结构。提出了三个应用案例研究:对大型英语词典中的单词重叠进行研究,探索生物信息学数据集中基因组之间的联系,以及PubMed摘要索引中的相关出版物。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号