首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine Workshops >A Mixture Language Model for Class-Attribute Mining from Biomedical Literature Digital Library
【24h】

A Mixture Language Model for Class-Attribute Mining from Biomedical Literature Digital Library

机译:生物医学文献数字图书馆类属性挖掘的混合语言模型

获取原文

摘要

We define and study a novel text mining problem for biomedical literature digital library, referred to as the class-attribute mining. Given a collection of biomedical literature from a digital library addressing a set of objects (e.g., proteins) and their descriptions (e.g., protein functions), the tasks of class-attribute mining include: (1) to identify and summarize latent classes in the space of objects, (2) to discover latent attribute themes in the space of object descriptions, and (3) to summarize the commonalities and differences among identified classes along each attribute theme. We approach this mining problem through a mixture language model and estimate the parameters of the model using the EM algorithm. We demonstrate the effectiveness of the model with an application called protein community identification and annotation from Medline, the largest biomedical literature digital library with more than 16 millions abstracts.
机译:我们定义并研究生物医学文献数字图书馆的新型文本挖掘问题,称为类属性挖掘。给定来自一个关于一组对象(例如,蛋白质)的数字图书馆的生物医学文献及其描述(例如,蛋白质函数),类属性挖掘的任务包括:(1)以识别和总结潜在的潜在课程对象的空间,(2)在对象描述的空间中发现潜在的属性主题,(3)总结每个属性主题的识别类之间的共性和差异。我们通过混合语言模型来处理该挖掘问题,并使用EM算法估算模型的参数。我们证明了模型的有效性与蛋白质社区识别和来自Medline的注释,最大的生物医学文献数字图书馆具有超过16000毫升的摘要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号