首页> 外文会议>2011 Colloquium in Information Science and Technology >Search engine of ancient Arabic manuscripts based on metadata and XML annotations
【24h】

Search engine of ancient Arabic manuscripts based on metadata and XML annotations

机译:基于元数据和XML注释的古代阿拉伯手稿搜索引擎

获取原文
获取原文并翻译 | 示例

摘要

The old manuscripts kept in libraries are a part of the richest cultural heritage and legacy of civilizations. Digitalization is a solution for the preservation of this cultural and historical heritage, which is very difficult to handle for users. The automatic or manual transcription of old Arabic manuscripts is an inevitable stage for the indexing and the diffusion of the contents of these manuscripts, the cursive nature of the Arabic writing presents an handicap for the software of optical character recognition (OCR).A transcription with software of text processing or with HTML format is not a better solution, owing to the fact that the content is not structured. The complex structure of Arabic Manuscripts may be brought closer to a hierarchical model. In fact, the strong dependence between the description of the data structure and how they are stored on physical media, provides a rigorous structures and paths of access, while maintaining relative simplicity of implementation. The creation of such a document database according to a hierarchical model requires coding and cataloging of heritage documents. eXtensible Markup Language (XML) provides a way to structure these documents by providing solutions that ensure data integrity. In the field of documentary heritage, where each document is referenced with a unique code by archivists, this code can be used as an identifier of the manuscripts in our document database to avoid data redundancy. The coding of documents will be validated by XML schemas by providing format checking, type and semantics of data in XML files. The process of identification, collection and registration information is provided by a search engine based on metadata and annotations. These annotations are used to generate XML tags in order to facilitate the transcription of Arabic manuscripts and feeding our documentary database. The images transcription of patrimonial documents, in particular the old Arabic manuscripts, require an encoding XML i-n conformity with recommendations Text Encoding Initiative (TEI). It is a XML-TEI encoding aiming to standardize the coding of these documents and to facilitate their exploitation, their exploration and their diffusion on line or off line. In this paper we propose a search engine of ancient Arabic manuscripts based on metadata and XML annotations, allowing searches in the database powered by handwritten transcribed documents and the indexed images corresponding to users' queries. The rich functionality, intuitive user interface, portability, extensibility and the powerful of the XML technology all make the search engine platform an ideal explorer for handling ancient Arabic manuscripts.
机译:保存在图书馆中的旧手稿是最丰富的文化遗产和文明遗产的一部分。数字化是保护这种文化和历史遗产的解决方案,对于用户而言,这是很难处理的。自动或手动转录旧的阿拉伯手稿是为这些手稿的内容编制索引和传播的必经阶段,阿拉伯文字的草书性质为光学字符识别(OCR)软件带来了障碍。文本处理软件或HTML格式的软件不是更好的解决方案,因为内容不是结构化的。阿拉伯手稿的复杂结构可能更接近分层模型。实际上,数据结构的描述及其在物理介质上的存储方式之间的强烈依赖关系提供了严格的结构和访问路径,同时保持了实现的相对简单性。根据分层模型创建此类文档数据库需要对遗产文档进行编码和分类。可扩展标记语言(XML)通过提供确保数据完整性的解决方案,提供了一种结构化这些文档的方法。在文献遗产领域,每个文档都由档案管理员使用唯一的代码进行引用,该代码可以用作我们文档数据库中手稿的标识符,以避免数据冗余。通过提供XML文件中数据的格式检查,类型和语义,XML架构将验证文档的编码。搜索引擎基于元数据和注释提供标识,收集和注册信息的过程。这些批注用于生成XML标签,以便于阿拉伯手稿的转录并提供给我们的文献数据库。遗产文件(特别是旧的阿拉伯手稿)的图像转录需要符合建议的文本编码倡议(TEI)的XML i-n编码。这是一种XML-TEI编码,旨在标准化这些文档的编码,并促进它们的利用,探索以及在线或离线传播。在本文中,我们提出了一种基于元数据和XML注释的古代阿拉伯手稿的搜索引擎,该引擎允许在由手写抄写文档以及与用户查询相对应的索引图像支持的数据库中进行搜索。丰富的功能,直观的用户界面,可移植性,可扩展性和XML技术的强大功能,使得搜索引擎平台成为处理古代阿拉伯手稿的理想浏览器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号