【24h】

Text mining in a digital library

机译:数字图书馆中的文本挖掘

获取原文
获取原文并翻译 | 示例
       

摘要

Text mining can be used to add value to documents in digital library collections; Greenstone already incorporates a few examples. What we have described is how a general text mining environment can be included within a digital library system. This provides a more satisfactory basis for putting future advances in text mining to work immediately for the benefit of digital library users. Our current implementation processes documents at display time only: in the future we plan to incorporate text mining into the collection-building process so that the data produced can be used for searching and browsing as well as for document display. It will also be necessary to design ways in which desired text mining operations can be specified in a collection's configuration file. The linkage of a general digital library system with a general text mining system presents many other possibilities. Tracking entities across documents leads to automatic hyperlinking of coreferences. Semantic indexing could be accomplished by annotating texts with ontology classes and allowing semantic searches rather than simple textual queries, providing some of the advantages of the Semantic Web. Document summaries could be generated automatically to serve as the "snippets" that are presented in lists of search results. Having GATE embedded in a digital library system will also benefit its own users by allowing them to experiment on real text collections in a digital library rather than with specially constructed text corpora. This spells good news for digital library users.
机译:文本挖掘可用于为数字图书馆馆藏中的文档增加价值;格林斯通已经结合了一些例子。我们所描述的是如何在数字图书馆系统中包含一般的文本挖掘环境。这为使文本挖掘的未来发展立即发挥作用,为数字图书馆用户的利益提供了更令人满意的基础。我们当前的实现仅在显示时处理文档:将来,我们计划将文本挖掘合并到馆藏构建过程中,以便将生成的数据用于搜索和浏览以及文档显示。还必须设计一种方法,以便可以在集合的配置文件中指定所需的文本挖掘操作。通用数字图书馆系统与通用文本挖掘系统的链接提供了许多其他可能性。跨文档跟踪实体会导致自动建立超链接的共同引用。语义索引可以通过使用本体类注释文本并允许语义搜索而不是简单的文本查询来实现,从而提供了语义网的某些优点。可以自动生成文档摘要,以作为搜索结果列表中显示的“摘要”。将GATE嵌入数字图书馆系统中还将使他们自己的用户受益,因为他们可以让他们在数字图书馆中而不是专门构建的文本语料库上对真实的文本集合进行试验。对于数字图书馆用户而言,这是个好消息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号