DISTRIBUTED MULTI-LINGUAL CONTENT BASED TEXT MINING DML – CBTM

机译：基于分布式多语言内容的文本挖掘DML – CBTM

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

With the explosion in information over the internet,extracting knowledge from media-based data in the form of images, audio streams and videos replacing textual ones is getting more complex. So a comprehensive methodology covering all forms of data are needed which is able to provide the contents of the data in a short period of time. Text mining tools and algorithms are becoming increasingly popular with many of the books, texts and documentation getting converted to soft-copy versions and being made globally accessible. Though this trend is predominantly in English language, the need has arisen for such an approach for other languages too, as many of the ancient and out-of-print texts in different languages are getting ‘softer’ versions for preserving and extraction of Information and Knowledge. In the context of Indian languages this need is more pronounced as many texts in different languages, scripts, different material forms ranging from palm leaves to stone cutting and dialects are available having wealth of information in variety of disciplines. In this paper, we propose a novel contentbased approach and demonstrate for textual data in the first instance, to be termed as CBTM (Content-Based Text-Mining) for knowledge discovery of multilingual texts. The proposed methodology employs a content based approach using keywords and patterns stored in the form of gif strings so that extensions to other forms of data are possible. Potential applications of this approach in a distributed environment are also highlighted. We have used the advertisements in newspapers for demonstrating the system.

机译：随着Internet信息的爆炸式增长，从图像，图像和音频流以及替代文本形式的视频等形式的基于媒体的数据中提取知识变得越来越复杂。因此，需要一种涵盖所有形式数据的综合方法，该方法能够在短时间内提供数据内容。文本挖掘工具和算法正变得越来越流行，许多书籍，文本和文档都已转换为软拷贝版本，并且可以全球访问。尽管这种趋势主要是英语，但是也出现了对其他语言的这种需求，因为许多古老且绝版的不同语言的文本都在“更软”的版本中用于保存和提取信息以及知识。在印度语言的背景下，这种需求更加明显，因为许多文本使用了不同的语言，文字，从棕榈叶到切石的不同材料形式以及方言，并且在各个学科中都有丰富的信息。在本文中，我们提出了一种新颖的基于内容的方法，并首先针对文本数据进行了演示，该方法被称为CBTM（基于内容的文本挖掘），用于多语言文本的知识发现。所提出的方法采用基于内容的方法，该方法使用以gif字符串形式存储的关键字和模式，从而可以扩展到其他形式的数据。还着重介绍了这种方法在分布式环境中的潜在应用。我们已经使用报纸上的广告来演示该系统。

著录项

来源
《Proceedings of the Third IASTED International Conference on Advances in Computer Science and Technology》|2007年|P.500-505|共6页
会议地点 Phuket(TH)
作者
S.Chitrakala; D.Manjula;
展开▼
作者单位

International Association of Science and Technology for Development (IASTED);

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Multilingual Text mining; content-based mining; association mining; Text content classification; Language-Wise-Keyword-Repository;

机译：多语言文本挖掘;基于内容的挖掘;关联挖掘;文本内容分类;语言-明智-关键字-知识库;

相似文献

外文文献
中文文献
专利

1. Accelerating text mining workloads in a MapReduce-based distributed GPU environment [J] . Peter Wittek, Sandor Daranyi Journal of Parallel and Distributed Computing . 2013,第2期

机译：在基于MapReduce的分布式GPU环境中加速文本挖掘工作负载
2. Analysis of content topics, user engagement and library factors in public library social media based on text mining [J] . Online Information Review . 2020,第1期

机译：基于文本挖掘的公共图书馆社交媒体内容主题，用户参与度和图书馆因素分析
3. Effective content-based pattern predicted text mining using PSE model [J] . P. Vijayakumar, S. Sukumaran International journal of advanced intelligence paradigms . 2020,第1期

机译：使用PSE模型进行有效的基于内容的模式预测文本挖掘
4. DISTRIBUTED MULTI-LINGUAL CONTENT BASED TEXT MINING DML – CBTM [C] . S.Chitrakala, D.Manjula IASTED International Conference on Advances in Computer Science and Technology . 2007

机译：基于分布的多语言内容的文本挖掘DML - CBTM
5. Text association mining with cross-sentence inference, structure-based document model and multi-relational text mining. [D] . Thaicharoen, Supphachai. 2009

机译：带有跨句推理的文本关联挖掘，基于结构的文档模型和多关系文本挖掘。
6. Correction to: A pattern learning-based method for temporal expression extraction and normalization from multi-lingual heterogeneous clinical texts [O] . Tianyong Hao, Xiaoyi Pan, Zhiying Gu, 2018

机译：更正为：一种基于模式学习的方法用于从多语言异类临床文本中进行时态表达提取和规范化
7. Accelerating Text Mining Workloads in a MapReduce-based Distributed GPU Environment [O] . Wittek, Peter, Darányi, Sándor 2013

机译：在基于MapReduce的分布式GPU环境中加快文本挖掘工作量

DISTRIBUTED MULTI-LINGUAL CONTENT BASED TEXT MINING DML – CBTM

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅