Existing methods of event detection are mainly based on traditional TF-IDF document representation with high dimension and sparse semantics,leading to low efficiency and accuracy. Thus, they are not suitable for large-scale online news event detection. A document representation method based on word embedding is proposed in this paper. By the document representation method, the document representation dimension is reduced, the semantic sparse problem is alleviated and the efficiency and accuracy of document similarity calculation are enhanced. Based on the document representation method, a dynamic online clustering method is proposed for online news event detection. Based on the dynamic online clustering method,both the accuracy and the recall of event detection are improved. Experiments on the standard dataset TDT4 and a real dataset show that the proposed adaptive online event detection method significantly improves the performance of event detection in both efficiency and accuracy compared with the state-of-the-art methods.%已有的事件发现方法主要基于词频-逆文档频率文档表示,维度较高,语义稀疏,效率和准确率都较低,不适用于大规模在线新闻事件发现.因此,文中提出基于词向量的文档表示方法,降低文档表示维度,缓解语义稀疏问题,提高文档相似度计算效率和准确性.基于该文档表示方法,提出动态在线新闻聚类方法,用于在线新闻事件发现,同时提高事件发现的准确率和召回率.在标准数据集TDT4和真实数据集上的实验表明,相比当前通用的基线方法,文中方法在时间效率和事件质量上都有显著提高.
展开▼