首页> 外文期刊>Journal of Intelligent Systems >Event Mining Through Clustering
【24h】

Event Mining Through Clustering

机译:通过聚类进行事件挖掘

获取原文
获取原文并翻译 | 示例
       

摘要

Traditional document clustering algorithms consider text-based features such as unique word count, concept count, etc. to cluster documents. Meanwhile, event mining is the extraction of specific events, their related sub-events, and the associated semantic relations from documents. This work discusses an approach to event mining through clustering. The Universal Networking Language (UNL)-based subgraph, a semantic representation of the document, is used as the input for clustering. Our research focuses on exploring the use of three different feature sets for event clustering and comparing the approaches used for specific event mining. In our previous work, the clustering algorithm used UNL-based event semantics to represent event context for clustering. However, this approach resulted in different events with similar semantics being clustered together. Hence, instead of considering only UNL event semantics, we considered assigning additional weights to similarity between event contexts with event-related attributes such as time, place, and persons. Although we get specific events in a single cluster, sub-events related to the specific events are not necessarily in a single cluster. Therefore, to improve our cluster efficiency, connective terms between two sentences and their representation as UNL subgraphs were also considered for similarity determination. By combining UNL semantics, event-specific arguments similarity, and connective term concepts between sentences, we were able to obtain clusters for specific events and their subevents. We have used 112 000 Tamil documents from the Forum for Information Retrieval Evaluation data corpus and achieved good results. We have also compared our approach with the previous state-of-the-art approach for Router-RCV1 corpus and achieved 30% improvements in precision.
机译:传统的文档聚类算法会考虑基于文本的功能(例如唯一字数,概念数等)来对文档进行聚类。同时,事件挖掘是从文档中提取特定事件,它们的相关子事件以及相关的语义关系。这项工作讨论了通过群集进行事件挖掘的方法。基于通用网络语言(UNL)的子图(文档的语义表示)用作聚类的输入。我们的研究重点是探索使用三种不同的功能集进行事件聚类,并比较用于特定事件挖掘的方法。在我们以前的工作中,聚类算法使用基于UNL的事件语义来表示聚类的事件上下文。但是,这种方法导致具有相似语义的不同事件被聚集在一起。因此,我们考虑到为事件上下文之间具有相似性的属性(例如时间,地点和人员)分配相似性,而不是仅考虑UNL事件语义。尽管我们在单个集群中获得特定事件,但是与特定事件相关的子事件并不一定在单个集群中。因此,为了提高聚类效率,还考虑了两个句子之间的连接词以及它们作为UNL子图的表示形式,以确定相似性。通过结合UNL语义,特定于事件的参数相似性以及句子之间的连接术语概念,我们能够获取特定事件及其子事件的聚类。我们已经使用了来自信息检索评估论坛数据集的112,000个泰米尔文文件,并取得了良好的效果。我们还将我们的方法与以前针对Router-RCV1语料库的最新方法进行了比较,并且将精度提高了30%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号