首页> 外文会议>International Conference on Advances in ICT for Emerging Regions >Summarization based approach for Old Sinhala Text Archival Search and Preservation
【24h】

Summarization based approach for Old Sinhala Text Archival Search and Preservation

机译:旧僧伽罗语文本档案搜索和保存的基于摘要的方法

获取原文

摘要

Old books are to be preserved and protected for the future needs. Preservation of these archives is crucial. The preservation and conservation of ancient and old antiques can be done using digitization so that they can be preserved for many years. The screw errors, noises and poor printing mechanisms make it challenge to recognition. Correcting the misspelled Sinhala words is also a challenge because Sinhala is a complex language. This paper elaborates an extensive approach derived through machine vision and natural language processing to preserve old text content as digitally searchable content. The scanned images of old books are taken and preprocess them to remove the noises. The Segmentation is done to ease the recognition of characters. After Optical Character Recognition, Sinhala spell correction is done to correct the misspelled words. The system provides separate summaries in book wise and chapter wise to get an abstract idea of books and chapters. Summary creation for Sinhala language is a challenge as Sinhala is a structured language. The System has mitigated most of these challenges successfully by achieving an average of 84% success in Text Line Segmentation and Layout Feature Identification, average of 74% success for OCR, average of 70% success for OCR Correction, average of 75% success for Keyword Extraction and average of 52% success for Summarization.
机译:旧书应加以保存和保护,以备将来之需。保存这些档案至关重要。古代和古代古董的保存和保护可以使用数字化进行,因此可以保存很多年。螺丝错误,噪音和不良的打印机制使其难以识别。纠正拼写错误的僧伽罗语单词也是一个挑战,因为僧伽罗语是一种复杂的语言。本文阐述了通过机器视觉和自然语言处理衍生的广泛方法,以将旧文本内容保留为可数字搜索的内容。拍摄旧书的扫描图像,并对它们进行预处理以消除噪音。进行分割以简化字符识别。光学字符识别之后,进行僧伽罗语拼写校正以纠正拼写错误的单词。该系统按书本和章节提供了单独的摘要,以获得对书本和章节的抽象思想。由于僧伽罗语是一种结构化语言,因此为僧伽罗语创建摘要是一个挑战。该系统通过在文本行分段和布局特征识别方面平均成功84%,在OCR中平均成功74%,在OCR校正中平均成功70%,在关键字中平均成功75%,成功地缓解了大多数此类挑战。提取,摘要的平均成功率为52%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号