
Rule Mining in Textual Data Using Passages




As the interest and needs for Knowledge Discovery and Data Mining (KDD) in texts increases, applying of association rule mining, the successful standard KDD method, to texts has attracted great attention. But contrary to the expectations, most of the works resulted acquiring syntactic rules or collocation of words, which are not satisfying in the context of KDD, where the objective is to extract previously unknown, useful information. One of the reasons of the unpleasing results can be due to the fact that most of the previous works process texts on syntactic base. For example, past works used words as items and documents as transactions, words and windows, terms and documents, words and passages (segment of text) respectivly. Here we propose a way of using passages as items and documents as transactions. According to [5], breaking down long text into passages will improve the result of information retrieval. This result indicates that passages are good indication of users' interests. We follow and extend this view, and take passages as an indication of topics in a document. Our goal is to find an association between topic in documents instead of association between words. The important issue of using passage is how to compare between passages which usally consists of set of words. Since the number and frequency of words which appear in passage are different passages to passages, there is no way to compare passages directly. We must convert them to some other processable representation.. In this paper we propose a representation of passage, and discuss a way to compare between passages with the capability to apply soft matching.



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号