首页> 外文会议>International Conference on Advances in ICT for Emerging Regions >Summarization based approach for Old Sinhala Text Archival Search and Preservation
【24h】

Summarization based approach for Old Sinhala Text Archival Search and Preservation

机译:基于旧的旧僧伽文本档案搜索和保存的方法

获取原文

摘要

Old books are to be preserved and protected for the future needs. Preservation of these archives is crucial. The preservation and conservation of ancient and old antiques can be done using digitization so that they can be preserved for many years. The screw errors, noises and poor printing mechanisms make it challenge to recognition. Correcting the misspelled Sinhala words is also a challenge because Sinhala is a complex language. This paper elaborates an extensive approach derived through machine vision and natural language processing to preserve old text content as digitally searchable content. The scanned images of old books are taken and preprocess them to remove the noises. The Segmentation is done to ease the recognition of characters. After Optical Character Recognition, Sinhala spell correction is done to correct the misspelled words. The system provides separate summaries in book wise and chapter wise to get an abstract idea of books and chapters. Summary creation for Sinhala language is a challenge as Sinhala is a structured language. The System has mitigated most of these challenges successfully by achieving an average of 84% success in Text Line Segmentation and Layout Feature Identification, average of 74% success for OCR, average of 70% success for OCR Correction, average of 75% success for Keyword Extraction and average of 52% success for Summarization.
机译:旧书将被保存并保护未来的需求。保护这些档案至关重要。可以使用数字化进行古老和旧古董的保存和保护,以便它们可以保留多年。螺钉误差,噪音和差的印刷机制使其成为识别的挑战。纠正拼写错误的僧伽罗语是一个挑战,因为僧伽罗是一种复杂的语言。本文阐述了通过机器视觉和自然语言处理的广泛方法,以将旧文本内容保存为数字可搜索的内容。拍摄旧书的扫描图像并预处理它们以删除噪音。分割是为了缓解字符的识别。在光学字符识别之后,僧伽拉法术校正是为了纠正拼错的单词。该系统在书中和章节中提供了单独的摘要,以获得书籍和章节的抽象理念。 SINHALA语言的摘要创建是僧伽罗的挑战是一种结构化语言。该系统通过在文本分割和布局特征识别方面的平均成功实现了84%的成功,为OCR的平均成功增长了84%,平均成功为74%,为OCR校正的平均成功70%,平均为75%的关键字成功为75%提取和平均总结52%的成功。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号