首页> 外文会议>Proceedings of the 2010 5th International Conference on Information and Automation for Sustainability >txtKnot — Text clustering based concept hierarchy to generalize from different text sources
【24h】

txtKnot — Text clustering based concept hierarchy to generalize from different text sources

机译:txtKnot —基于文本聚类的概念层次结构,可以从不同的文本源进行概括

获取原文

摘要

Living in the modern technology dependent world, we heavily rely on electronically stored data and information, to come up with sound and timely decisions. Considering the entire information technology world, there exists an unimaginable volume of data which contains a lot of information which is relevant to various kinds of fields. But the problem emerges when we are interested to find out about a particular subject. This is due to its scattered nature of relevant and non-relevant data. Therefore it is fair to say that there exists a critical need for a system which could create an ordered structure that provides a way of modeling the underlying relationships of data elements which will ultimately result in a much easier process of decision making. txtKnot is all about solving the above problem by generating a meaningful hierarchy of concepts from a set of unsorted text documents, thus enabling the visualization of relationships that exist within the set of documents. It consists of four main components namely, Data Extraction Module, Data Pre-processor Module, Text Clustering Module and Concept Hierarchy Generation Module. These four components are integrated together in order to fulfill the main objective of providing an easy to use method of organizing, visualizing, searching and filtering of the huge amount of electronically available unsorted textual data.
机译:生活在依赖现代技术的世界中,我们在很大程度上依靠电子存储的数据和信息来做出合理及时的决策。考虑到整个信息技术世界,存在不可思议的数据量,其中包含许多与各种领域相关的信息。但是,当我们有兴趣了解某个特定主题时,就会出现问题。这是由于其相关和不相关数据的分散性质。因此,可以说,迫切需要一种可以创建有序结构的系统,该结构提供一种对数据元素的基础关系进行建模的方法,最终将使决策过程变得更加容易。 txtKnot就是通过从一组未排序的文本文档中生成有意义的概念层次结构来解决上述问题的方法,从而使该组文档中存在的关系可视化。它由四个主要组件组成,分别是数据提取模块,数据预处理器模块,文本聚类模块和概念层次生成模块。这四个组件集成在一起,以实现提供一种易于使用的组织,可视化,搜索和过滤大量电子可用的未分类文本数据的方法的主要目的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号