首页> 外文会议>International Symposium on Knowledge and Systems Sciences >Rule Mining in Textual Data Using Passages

【24h】

Rule Mining in Textual Data Using Passages

机译：使用段落在文本数据中挖掘

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

As the interest and needs for Knowledge Discovery and Data Mining (KDD) in texts increases, applying of association rule mining, the successful standard KDD method, to texts has attracted great attention. But contrary to the expectations, most of the works resulted acquiring syntactic rules or collocation of words, which are not satisfying in the context of KDD, where the objective is to extract previously unknown, useful information. One of the reasons of the unpleasing results can be due to the fact that most of the previous works process texts on syntactic base. For example, past works used words as items and documents as transactions, words and windows, terms and documents, words and passages (segment of text) respectivly. Here we propose a way of using passages as items and documents as transactions. According to [5], breaking down long text into passages will improve the result of information retrieval. This result indicates that passages are good indication of users' interests. We follow and extend this view, and take passages as an indication of topics in a document. Our goal is to find an association between topic in documents instead of association between words. The important issue of using passage is how to compare between passages which usally consists of set of words. Since the number and frequency of words which appear in passage are different passages to passages, there is no way to compare passages directly. We must convert them to some other processable representation.. In this paper we propose a representation of passage, and discuss a way to compare between passages with the capability to apply soft matching.

机译：由于知识发现和数据挖掘（KDD）在文本中的兴趣和需求增加，关联规则挖掘，成功标准KDD方法，文本的应用引起了极大的关注。但与预期相反，大多数作品导致获取句法规则或单词的搭配，这些词语不满足KDD的背景，其中目标是提取先前未知的，有用的信息。令人难倒的结果的原因之一可能是由于大多数以前的工程在句法基础上的文本。例如，过去的作品使用单词作为项目和文档作为事务，单词和Windows，术语和文档，单词和段落（文本段）。在这里，我们提出了一种使用段落作为物品和文件作为交易的方式。根据[5]，将长文本分解为段落将改善信息检索的结果。这结果表明，段落是用户兴趣的良好指示。我们遵循并扩展此视图，并将段落作为文档中的主题指示。我们的目标是在文档中找到主题之间的关联而不是单词之间的关联。使用段落的重要问题是如何在段落之间进行比较，这通常由一组单词组成。由于段落中出现的单词的数量和频率是对段落不同的段落，因此无法直接比较段落。我们必须将它们转换为其他一些可加工的代表。在本文中，我们提出了一种段落的表示，并讨论了在段落之间进行比较，以应用软匹配的能力。

著录项

来源
《International Symposium on Knowledge and Systems Sciences 》|2004年||共6页
会议地点
作者
Kentaro Nagai; Tu Bao Ho; International Society for Knowledge and Systems Sciences(ISKSS); Japan Advanced Institute of Science and Technology(JAIST)Japan; Dalian University of Technology(DUT) China;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类系统科学 ;
关键词
text mining; association rule mining; passage;

机译：文本挖掘;协会规则挖掘;通过;

相似文献

外文文献
中文文献
专利

1. Association Rule Mining From Textual Data using Passages [J] . KENTARO NAGAI, Ho Tu BAG 電子情報通信学会技術研究報告. 人工知能と知識処理. Artificial Intelligence and Knowledge Based Processing . 2004 ,第485期

机译：使用段落从文本数据中挖掘关联规则
2. Genetic algorithm rule based categorization method for textual data mining [J] . Afif M., Ghareb A., Saif A., Decision Science Letters . 2020 ,第1期

机译：基于遗传算法的文本数据挖掘的分类方法
3. Textual data science with R , Mónica Bécue‐Bertaut , Boca Raton : CRC Press . Textual data science with R Textual data science with R , Mónica Bécue‐Bertaut Mónica Mónica Bécue‐Bertaut Bécue‐Bertaut , Boca Raton Boca Raton : CRC Press CRC Press . [J] . Sánchez Brisa N. Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 2019 ,第4期

机译：与R，Mónica的文本数据科学，Boca Raton：CRC压力机。文本数据科学与R文本数据科学与R，MónicaCocue-BertautmónicaMonica·莫尼卡（Bergaut）Bectaut，Boca Raton Boca Raton：CRC按CRC压力机。
4. Rule Mining in Textual Data Using Passages [C] . Kentaro Nagai, Tu Bao Ho International Symposium on Knowledge and Systems Sciences(KSS2004); 20041110-12; Ishikawa(JP) . 2004

机译：使用段落在文本数据中进行规则挖掘
5. Mining fuzzy association rules on large numerical data: A data mining system for NAWN. [D] . Komo, Zimpi. 2003

机译：在大型数值数据上挖掘模糊关联规则：NAWN的数据挖掘系统。
6. COVID-19 and Media datasets: Period- and location-specific textual data mining [O] . Mathieu Roche 2020

机译：Covid-19和媒体数据集：周期和位置特定的文本数据挖掘
7. Distributed Higher Order Association Rule Mining Using Information Extracted from Textual Data [O] . Shenzhi Li 2005

机译：使用从文本数据中提取的信息进行分布式高阶关联规则挖掘
8. Centering Resonance Analysis: A Superior Data Mining Algorithm for Textual Data Streams [R] . Dooley, K. , Corman, S. , Ballard, D. 2004

机译：中心共振分析：一种优秀的文本数据流数据挖掘算法

Rule Mining in Textual Data Using Passages

摘要

著录项

相似文献

相关主题

期刊订阅