String alignment for automated document versioning

Wei Lee Woon; Kuok-Shoong Daniel Wong

首页> 外文期刊>Knowledge and Information Systems >String alignment for automated document versioning

【24h】

String alignment for automated document versioning

机译：字符串对齐，用于自动文档版本控制

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The automated analysis of documents is an important task given the rapid increase in availability of digital texts. Automatic text processing systems often encode documents as vectors of term occurrence frequencies, a representation which facilitates the classification and clustering of documents. Historically, this approach derives from the related field of data mining, where database entries are commonly represented as points in a vector space. While this lineage has certainly contributed to the development of text processing, there are situations where document collections do not conform to this clustered structure, and where the vector representation may be unsuitable for text analysis. As a proof-of-concept, we had previously presented a framework where the optimal alignments of documents could be used for visualising the relationships within small sets of documents. In this paper we develop this approach further by using it to automatically generate the version histories of various document collections. For comparison, version histories generated using conventional methods of document representation are also produced. To facilitate this comparison, a simple procedure for evaluating the accuracy of the version histories thus generated is proposed.

机译：鉴于数字文本的可用性迅速增加，文档的自动分析是一项重要的任务。自动文本处理系统通常将文档编码为术语出现频率的向量，这种表示有助于文档的分类和聚类。从历史上看，这种方法源自数据挖掘的相关领域，在该领域中，数据库条目通常表示为向量空间中的点。尽管此谱系无疑促进了文本处理的发展，但在某些情况下文档集合不符合此聚类结构，并且矢量表示可能不适合文本分析。作为概念验证，我们之前已经提出了一个框架，在该框架中，可以使用文档的最佳对齐方式来可视化小文档集中的关系。在本文中，我们通过使用该方法自动生成各种文档集合的版本历史来进一步开发此方法。为了进行比较，还生成了使用常规文档表示方法生成的版本历史记录。为了促进这种比较，提出了一种用于评估由此生成的版本历史的准确性的简单过程。

著录项

来源
《Knowledge and Information Systems》 |2009年第3期|p.293-309|共17页
作者
Wei Lee Woon; Kuok-Shoong Daniel Wong;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
String matching; Text processing; Data mining; Versioning; Information retrieval;

机译：字符串匹配;文字处理;数据挖掘;版本控制;信息检索;

相似文献

外文文献
中文文献
专利

1. String alignment for automated document versioning [J] . Wei Lee Woon, Kuok-Shoong Daniel Wong Knowledge and information systems . 2009,第3期

机译：字符串对齐，用于自动文档版本控制
2. GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents [J] . Srinivasa Krishnarajanagar G., Shree Devi B. N. Journal of The Institution of Engineers (India): Series B . 2017,第5期

机译：基于GPU的带分数表方法的N-Gram字符串匹配算法，用于在许多文档中进行字符串搜索
3. The 3d N documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$ mathcal{N} $$end{document} = 6 bootstrap: from higher spins to strings to membranes [J] . Damon J. Binder, Shai M. Chester, Max Jerdee, The journal of high energy physics . 2021,第5期

机译：3D <内联公式ID =“IEQ1”> <替代方案> N DocumentClass [12pt] {minimal} usepackage {ammath} usepackage {isysym} usepackage {amsfonts} usepackage {amssys} usepackage {mathrsfs} usepackage {mathrsfs} usepackage {supbeek } setLength { oddsidemargin} { - 69pt} begin {document} $$$ nathcal {n} $$ end {document} = 6引导：从更高的旋转到横向到膜
4. Indexing Spoken Documents with Hierarchical Semantic Structures: Semantic Tree-to-string Alignment Models [C] . Xiaodan Zhu, Colin Cherry, Gerald Penn IJCNLP 2011 . 2011

机译：使用分层语义结构索引口头文档：语义树到字符串对齐模型
5. Automated speech rhythm classification via automated alignment. [D] . Heo, Inseok. 2015

机译：通过自动对齐实现自动语音节奏分类。
6. A System for Automated Extraction of Metadata from Scanned Documents using Layout Recognition and String Pattern Search Models [O] . Dharitri Misra, Siyuan Chen, George R. Thoma -1

机译：使用布局识别和字符串模式搜索模型从扫描文档中自动提取元数据的系统
7. ACC/AHA/ESC guidelines for the management of patients with atrial fibrillation31This document was approved by the American College of Cardiology Board of Trustees in August 2001, the American Heart Association Science Advisory and Coordinating Committee in August 2001, and the European Society of Cardiology Board and Committee for Practice Guidelines and Policy Conferences in August 2001.32When citing this document, the American College of Cardiology, the American Heart Association, and the European Society of Cardiology would appreciate the following citation format: Fuster V, Rydén LE, Asinger RW, Cannom DS, Crijns HJ, Frye RL, Halperin JL, Kay GN, Klein WW, Lévy S, McNamara RL, Prystowsky EN, Wann LS, Wyse DG. ACC/AHA/ESC guidelines for the management of patients with atrial fibrillation: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and the European Society of Cardiology Committee for Practice Guidelines and Policy Conferences (Committee to Develop Guidelines for the Management of Patients With Atrial Fibrillation). J Am Coll Cardiol 2001;38:XX-XX.33This document is available on the World Wide Web sites of the American College of Cardiology (www.acc.org), the American Heart Association (www.americanheart.org), the European Society of Cardiology (www.escardio.org), and the North American Society of Pacing and Electrophysiology (www.naspe.org). Single reprints of this document (the complete Guidelines) to be published in the mid-October issue of the European Heart Journal are available by calling +44.207.424.4200 or +44.207.424.4389, faxing +44.207.424.4433, or writing Harcourt Publishers Ltd, European Heart Journal, ESC Guidelines – Reprints, 32 Jamestown Road, London, NW1 7BY, United Kingdom. Single reprints of the shorter version (Executive Summary and Summary of Recommendations) published in the October issue of the Journal of the American College of Cardiology and the October issue of Circulation, are available for $5.00 each by calling 800-253-4636 (US only) or by writing the Resource Center, American College of Cardiology, 9111 Old Georgetown Road, Bethesda, Maryland 20814. To purchase bulk reprints specify version and reprint number (Executive Summary 71-0208; full text 71-0209) up to 999 copies, call 800-611-6083 (US only) or fax 413-665-2671; 1000 or more copies, call 214-706-1466, fax 214-691-6342; or E-mail: pubauth@heart.org. A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and the European Society of Cardiology Committee for Practice Guidelines and Policy Conferences (Committee to Develop Guidelines for the Management of Patients With Atrial Fibrillation) Developed in Collaboration With the North American Society of Pacing and Electrophysiology [O] . Fuster Valentin, Rydén Lars E., Asinger Richard W., 2001

机译：ACC / AHA / ESC治疗房颤患者指南31该文件于2001年8月获得美国心脏病学会董事会，2001年8月美国心脏协会科学咨询与协调委员会以及欧洲心脏病学会的批准以及实践指南和政策委员会会议（2001年8月）。32引用本文件时，美国心脏病学会，美国心脏协会和欧洲心脏病学会将赞赏以下引用格式：Fuster V，RydénLE，Asinger RW，Cannom DS，Crijns HJ，Frye RL，Halperin JL，Kay GN，Klein WW，LévyS，McNamara RL，Prystowsky EN，Wann LS，Wyse DG。 ACC / AHA / ESC治疗房颤患者的指南：美国心脏病学会/美国心脏协会实践指南工作组和欧洲心脏病学会实践指南委员会和政策会议的报告（制定指南委员会）用于房颤患者的治疗）。 J Am Coll Cardiol 2001; 38：XX-XX.33本文件可在美国心脏病学会（www.acc.org），美国心脏协会（www.americanheart.org），欧洲的万维网站点上找到心脏病学会（www.escardio.org）和北美起搏和电生理学会（www.naspe.org）。可致电+44.207.424.4200或+44.207.424.4389，传真+44.207.424.4433或写信给Harcourt Publishers，以获取本文档（完整的准则）的单份重印本（完整的准则），该印刷本将于10月中旬出版。欧洲心脏杂志，ESC指南–转载，英国伦敦詹姆斯敦路32号，NW1 7BY。短版（执行摘要和建议摘要）的单版重印在《美国心脏病学会杂志》十月刊和《循环》十月刊上，致电800-253-4636（仅美国），每本售价5.00美元。）或写信给美国心脏病学院资源中心，地址是：马里兰州贝塞斯达市Old Georgetown Road 9111，邮编20814。要购买批量转载，请指定版本和转载编号（执行摘要71-0208；全文71-0209），最多999份，致电800-611-6083（仅限美国）或传真413-665-2671； 1000或更多副本，请致电214-706-1466，传真214-691-6342;或电子邮件：pubauth@heart.org。美国心脏病学会/美国心脏协会实践指南工作组和欧洲心脏病学会实践指南和政策会议（制定房颤患者治疗指南委员会）的报告是与北方合作开发的美国起搏与电生理学会
8. Automated Interactive Simulation Model (AISIM) VAX Version 5.0 Version Description Document [R] . Sweet, S. 1987

机译：自动交互式仿真模型（aIsIm）VaX版本5.0版本描述文档

String alignment for automated document versioning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅