SUDMAD: Sequential and Unsupervised Decomposition of a Multi-Author Document Based on a Hidden Markov Model

Khaled Aldebei; Xiangjian He; Wenjing Jia; Weichang Yeh

首页> 外文期刊>Journal of the American Society for Information Science and Technology >SUDMAD: Sequential and Unsupervised Decomposition of a Multi-Author Document Based on a Hidden Markov Model

【24h】

SUDMAD: Sequential and Unsupervised Decomposition of a Multi-Author Document Based on a Hidden Markov Model

机译：SUDMAD：基于隐马尔可夫模型的多作者文档的顺序和无监督分解

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Decomposing a document written by more than one author into sentences based on authorship is of great significance due to the increasing demand for plagiarism detection, forensic analysis, civil law (i.e., disputed copyright issues), and intelligence issues that involve disputed anonymous documents. Among existing studies for document decomposition, some were limited by specific languages, according to topics or restricted to a document of two authors, and their accuracies have big room for improvement. In this paper, we consider the contextual correlation hidden among sentences and propose an algorithm for Sequential and Unsupervised Decomposition of a Multi-Author Document (SUDMAD) written in any language, disregarding topics, through the construction of a Hidden Markov Model (HMM) reflecting the authors' writing styles. To build and learn such a model, an unsupervised, statistical approach is first proposed to estimate the initial values of HMM parameters of a preliminary model, which does not require the availability of any Information of author's or document's context other than how many authors contributed to writing the document. To further boost the performance of this approach, a boosted HMM learning procedure is proposed next, where the initial classification results are used to create labeled training data to learn a more accurate HMM. Moreover, the contextual relationship among sentences is further utilized to refine the classification results. Our proposed approach is empirically evaluated on three benchmark datasets that are widely used for authorship analysis of documents. Comparisons with recent state-of-the-art approaches are also presented to demonstrate the significance of our new ideas and the superior performance of our approach.

机译：由于对窃检测，法医分析，民法（即有争议的版权问题）以及涉及有争议的匿名文件的情报问题的需求不断增加，因此将多位作者撰写的文档分解为基于作者身份的句子具有重要意义。在现有的文档分解研究中，有些研究根据主题受到特定语言的限制，或者仅限于两位作者的文档，其准确性还有很大的提高空间。在本文中，我们考虑了句子之间隐藏的上下文相关性，并提出了一种通过构建隐马尔可夫模型（HMM）来反映以多种语言编写的，不考虑主题的多作者文档（SUDMAD）的顺序和无监督分解算法。作者的写作风格。为了构建和学习这样的模型，首先提出了一种无监督的统计方法来估计初步模型的HMM参数的初始值，该方法不需要提供多少作者或文档上下文信息，而是需要多少作者贡献。编写文档。为了进一步提高该方法的性能，接下来提出了增强的HMM学习程序，其中，初始分类结果用于创建标记的训练数据以学习更准确的HMM。此外，句子之间的上下文关系被进一步利用以细化分类结果。我们对三种基准数据集进行了经验评估，该基准数据集广泛用于文档的作者分析。还提出了与最新技术方法的比较，以证明我们新思想的重要性以及我们方法的卓越性能。

著录项

来源
《Journal of the American Society for Information Science and Technology》 |2018年第2期|201-214|共14页
作者
Khaled Aldebei; Xiangjian He; Wenjing Jia; Weichang Yeh;
展开▼
作者单位

Global Big Data Technologies Centre, University of Technology Sydney, Australia. Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University, Fuzhou, Fujian, 350121, China;

Global Big Data Technologies Centre, University of Technology Sydney, Australia.School of Software and Microelectronics, Northwestern Polytechnical University, China;

Global Big Data Technologies Centre, University of Technology Sydney, Australia;

Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Taiwan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Unsupervised texture segmentation using multichannel decomposition and hidden Markov models [J] . Jia-Lin Chen, Kundu A. IEEE Transactions on Image Processing . 1995,第5期

机译：使用多通道分解和隐马尔可夫模型的无监督纹理分割
2. Unsupervised machine learning via Hidden Markov Models for accurate clustering of plant stress levels based on imaged chlorophyll fluorescence profiles & their rate of change in time [J] . Computers and Electronics in Agriculture . 2020,第期

机译：无监督机器通过隐马尔可夫模型学习，基于成像叶绿素荧光谱的植物应激水平准确聚类及其变化率
3. Unsupervised learning and mapping of active brain functional MRI signals based on hidden semi-Markov event sequence models [J] . Faisan S., Thoraval L., Armspach J.-P., IEEE Transactions on Medical Imaging . 2005,第2期

机译：基于隐藏的半马尔可夫事件序列模型的活动脑功能MRI信号的无监督学习和映射
4. Unsupervised Multi-Author Document Decomposition Based on Hidden Markov Model [C] . Khaled Aldebei, Xiangjian He, Wenjing Jia, Annual meeting of the Association for Computational Linguistics . 2016

机译：基于隐马尔可夫模型的无监督多作者文档分解
5. Robust texture identification and unsupervised texture segmentation using multichannel decomposition and hidden Markov model. [D] . Chen, Jia-Lin. 1992

机译：使用多通道分解和隐马尔可夫模型的稳健纹理识别和无监督纹理分割。
6. Hidden Markov Model based stride segmentation on unsupervised free-living gait data in Parkinson’s disease patients [O] . Nils Roth, Arne Küderle, Martin Ullrich, 2021

机译：基于隐马尔可夫模型在帕金森病患者中的无监督自由生活步态数据的立场分割
7. SUDMAD: Sequential and unsupervised decomposition of a multi-author document based on a hidden markov model [O] . Khaled Aldebei, Xiangjian He, Wenjing Jia, 2017

机译：SUDMAD：基于隐藏的马尔可夫模型的多作者文档的顺序和无监督分解

SUDMAD: Sequential and Unsupervised Decomposition of a Multi-Author Document Based on a Hidden Markov Model

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅