首页> 外文会议>10th IEEE International Conference on Data Mining >Mother Fugger: Mining Historical Manuscripts with Local Color Patches
【24h】

Mother Fugger: Mining Historical Manuscripts with Local Color Patches

机译:Fugger妈妈:挖掘带有当地色彩补丁的历史手稿

获取原文

摘要

Initiatives such as the Google Print Library Project and the Million Book Project have already archived more than ten million books in digital format, and within the next decade the majority of worldȁ9;s books will be online. Although most of the data will naturally be text, there will also be tens of millions of pages of images, many in color. While there is an active research community pursuing data mining of text from historical manuscripts, there has been very little work that exploits the rich color information which is often present. In this work we introduce a simple color measure which both addresses and exploits typical features of historical manuscripts. To enable the efficient mining of massive archives, we propose a tight lower bound to the measure. Beyond the fast similarity search, we show how this lower bound allows us to build several higher-level data mining tools, including motif discovery and link analyses. We demonstrate our ideas in several data mining tasks on manuscripts dating back to the fifteenth century.
机译:诸如Google图书搜索图书馆项目和Million Book Project之类的计划已经以数字格式存档了超过一千万本书,并且在接下来的十年中,全球9本书中的绝大部分将在线出版。尽管大多数数据自然是文本,但也将有数以千万计的彩色图像页面。虽然有一个活跃的研究团体正在从历史手稿中进行文本数据挖掘,但很少有工作可以利用经常出现的丰富色彩信息。在这项工作中,我们介绍了一种简单的色彩测量方法,它既可以解决也可以利用历史手稿的典型特征。为了有效地挖掘海量档案,我们提出了一个严格的下限。除了快速相似性搜索之外,我们还展示了这个下限如何使我们能够构建一些更高级别的数据挖掘工具,包括主题发现和链接分析。我们在可追溯到15世纪的手稿的多个数据挖掘任务中展示了我们的想法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号