首页> 外文期刊>3D Research >Semantic analysis based forms information retrieval and classification
【24h】

Semantic analysis based forms information retrieval and classification

机译:基于语义分析的表单信息检索与分类

获取原文
获取原文并翻译 | 示例
           

摘要

Data entry forms are employed in all types of enterprises to collect hundreds of customer’s information on daily basis. The information is filled manually by the customers. Hence, it is laborious and time consuming to use human operator to transfer these customers information into computers manually. Additionally, it is expensive and human errors might cause serious flaws. The automatic interpretation of scanned forms has facilitated many real applications from speed and accuracy point of view such as keywords spotting, sorting of postal addresses, script matching and writer identification. This research deals with different strategies to extract customer’s information from these scanned forms, interpretation and classification. Accordingly, extracted information is segmented into characters for their classification and finally stored in the forms of records in databases for their further processing. This paper presents a detailed discussion of these semantic based analysis strategies for forms processing. Finally, new directions are also recommended for future research. Keywords Form processing Data interpretations Preprocessing Validation strategies Semantic analysis Post-processing Page %P Close Plain text Look Inside Reference tools Export citation EndNote (.ENW) JabRef (.BIB) Mendeley (.BIB) Papers (.RIS) Zotero (.RIS) BibTeX (.BIB) Add to Papers Other actions Register for Journal Updates About This Journal Reprints and Permissions Share Share this content on Facebook Share this content on Twitter Share this content on LinkedIn Related Content Supplementary Material (0) References (46) References1.T. Saba, S Alzorani, A. Rehman (2012) Expert System for Offline Clinical Guidance and Treatment, Life Science Journal, 9(4): 2639–2658.2.A. Rehman, F. Kurniawan, and T. Saba (2011) An Automatic Approach for Line detection and Removal without Characters Smash-up, Imaging Science Journal, 59: 171–182.3.A. Vinciarelli (2002) A survey on offline cursive word recognition, Pattern Recognition, 35(7), 1433–1446.MATHCrossRef4.F. Kurniawan, A. Rehman and D. Mohamad (2009) Contour Vs Non-Contour based Word Segmentation from Handwritten Text Lines. An Experimental Analysis, International Journal of Digital Content Technology and its Applications, 3(2): 127–131.CrossRef5.B. Gatos, A. Antonacopoulos, N. Stamatopoulos (2007) ICDAR 2007 Script Segmentation Context, Proceedings of the International Conference on Document Analysis and Recognition, 1284–1288.6.K. M Sayre (1973) Machine Recognition of Handwritten Words: A Project Report, Pattern Recognition, 5: 213–228.CrossRef7.F. Kurniawan, M.S.M. Rahim, D. Daman, A. Rehman, D. Mohamad and S. Mariyam (2011) Region-based Touched Character Segmentation in Handwritten Words, International Journal of Innovative Computing, Information and Control, 7(6): 3107–3120.8.A. Rehman and D. Mohamad (2008) A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in Conjunction of Neural Network, International Journal of Image Processing, 2(3): 29–35.9.A.E. Rad, M.S.M. Rahim, A. Rehman, A. Altameem, and T. Saba (2013) Evaluation of Current Dental Radiographs Segmentation Approaches in Computer-aided Applications, IETE Technical Review, 30(3): 210–222.CrossRef10.Y. H. Tay (2002) Offline Script Recognition using Artificial Neural Network and Hidden Morkov Model, PhD thesis, Universiti Teknologi Malaysia, Faculty of Electrical Engineering, 78.11.G. Sulong, T. Saba, and A. Rehman (2010) Dynamic Programming Based Hybrid Strategy for Offline Cursive Script Recognition. IEEE Second International Conference on Computer and Engineering, 2: 580–584.12.MSM Rahim, A. Rehman, N. Sholihah, F. Kurniawan and T. Saba (2012) Region-based Features Extraction in Ear Biometrics, International Journal of Academic Research, 4(1): 37–42.13.W. Senior, and A. J. Robinson (2002) An offline cursive script recognition system, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3): 309–321.CrossRef14.G. L. Martin and M. Rashid (1991) Recognizing overlapping hand-printed characters by centered-object integrated segmentation and recognition, Technical Report, Microelectronics and Computer Technology Corporation.15.L. Rabiner (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, 77(2): 257–286.CrossRef16.S. Procter, A. J. Elms (1998) The Recognition of Handwritten Digit Strings of Unknown Length using Hidden Markov Models, Proceedings of the Fourteenth International Conference on Pattern Recognition (ICPR’98), 1515–1517.CrossRef17.A. J. Elms, S. Procter, J. Illingworth (1989) The Advantage of using an HMM-based Approach for Faxed Word Recognition, International Journal on Document Analysis and Recognition, 18–36.18.M. Zimmermann and H. Bunke (2002) Hidden Markov Model Length Optimization for Script Recognition Systems, International Workshop on Frontiers in Script Recognition, Niagara-on-the-Lakes, 369–374.19.Jr. A. Britto, R. Sabourin, F. Bortolozzi, and C. Y. Suen (2001) A two-stage HMM-based system for recognizing handwritten numeral strings, Proceedings of the International Conference on Document Analysis and Recognition, Seattle, USA, 396–400.20.Jr. A. Britto, R. Sabourin, F. Bortolozzi, C. Y. Suen (2002) A string length predictor to control the level building of HMMs for handwritten numeral recognition, Proceedings of 16th International Conference on Pattern Recognition, 4: 31–34.21.E. Kavallieratou, E. Stamatatos, N. Fakotakis and G. Kokkinakis (2000) Handwritten Character Segmentation Using Transformation-Based Learning, Proceedings of 15th International Conference on Pattern Recognition, 2: 634–637.22.P. R. Cavalin, A. S. Britto, F. Bortolozzi, R. Sabourin and L. S. Oliveira (2006) An Implicit Segmentation based Method for Recognition of Handwritten Strings of Characters, Proceedings of ACM symposium on applied computing, 836–840.23.T. Hamamura, T. Akagi and B. Irie (2007) An Analytic Word Recognition Algorithm Using a Posteriori Probability, Proceedings of International Conference on Document Analysis and Recognition, 02: 669–673.24.F. Bortolozzi, A. Souza, Jr. A. Britto, Luiz S. Oliveira and M. Morita (2005) Recent Advances in Script Recognition, Document Analysis, Editors: Umapada Pal, Swapan K. Parui, Bidyut B. Chaudhuri, 1–30.25.R. J. Pinales, R. Jaime-Rivas, M. J. Castro (2011) Discriminative capacity of perceptual features in script recognition, Telecommunications and Radio Engineering, 64(11), 931–937.CrossRef26.O. D. Trier, A. K. Jain and T. Taxt (1996) Feature Extraction Methods for Character recognition- a Survey, Pattern Recognition, 29(4): 641–662CrossRef27.X. Wang, X. Ding and C. Liu (2005) Gabor filters based feature extraction for character recognition, Pattern Recognition, 38(3), 369–379.MathSciNetMATHCrossRef28.A. Rehman and T. Saba (2011) Performance Analysis of Segmentation Approach for Cursive Handwritten Word Recognition on Benchmark Database, Digital Signal Processing, 21(3): 486–490.CrossRef29.M. Harouni, MSM, Rahim, D. Mohamad, A. Rehman and T. Saba (2012) Online Cursive Persian/Arabic Character Recognition by Detecting Critical Points, International Journal of Academic Research, 4(2): 209–21430.M. Blumenstein, X. Y. Liu and B. Verma (2007) An Investigation of the Modified Direction Feature for Cursive Character Recognition, Pattern Recognition, 40: 376–388.MATHCrossRef31.F. Camastra and A. Vinciarelli (2003) Combining neural gas and learning vector quantization for cursive character recognition, Neuro-computing, 51: 147–15932.M. Cheriet, N. Kharma, C-Lin. Liu, C-Y. Suen (2007) Character Recognition Systems (OCR), Wiley, 204–206.33.S. Günter and H. Bunke (2004) Feature selection algorithms for the generation of multiple classier systems and their application to handwritten word recognition, Pattern Recognition Letters, 25(11): 1323–1336.CrossRef34.M. Ghosh, R. Ghosh, and B. Verma (2004) A Fully Automated Offline Script Recognition System Incorporating Rule Based Neural Network Validated Segmentation and Hybrid Neural Network Classifier, International Journal of Pattern Recognition and Artificial Intelligence, 18(7): 1267–1283.CrossRef35.C-L. Liu and H. Fujisawa (2005) Classification and learning for character recognition: Comparison of methods and remaining problems, Proceedings of the International Workshop on Neural Networks and Learning in Document Analysis and Recognition, 5–7.36.A. A. Aburas and S. A. Rehiel (2008) New Promising Off Line Tool for Arabic Handwritten Character Recognition Based On JPEG2000 Image Compression, Proceedings of the 3rd International Conference on Introduction and Communication Technology. From Theory to Applications. (ICTTA, 08), 1–5.37.Jr. A. Britto, R. Sabourin, F. Bortolozzi, and C. Y. Suen (2004) Foreground and background information in an HMMbased method for recognition of isolated characters and numeral strings, Proceedings of the 9th International Workshop on Frontiers in Script Recognition, 371–376.38.S. Günter and H. Bunke (2005) Offline cursive script recognition using multiple classifier systems. On the influence of vocabulary, ensemble, and training set size, Optics and Lasers in Engineering, 43(3–5): 437–454.CrossRef39.M. -P. Schambach (2005) Fast script word recognition with very large vocabulary, Proceedings of the 8th International Conference on Document Analysis and Recognition, 9–13.40.B. Gatos, I. Pratikakis, A. L. Kesidis and S. J. Perantonis (2008) Efficient offline cursive script word recognition. Proceedings of the Tenth International Workshop on Frontiers in Script Recognition.41.A. Rehman, S. Alqahtani, A. Altameem and T. Saba (2013) Virtual machine security challenges: case studies, International Journal of Machine Learning and Cybernatics, DOI: 10.1007/s13042-013-0166-4.42.A. Rehman and T. Saba (2011). Document Skew Estimation and Correction: Analysis of Techniques, Common problems and Possible Solutions, Applied Artificial Intelligence, 25(9): 769–787.CrossRef43.S. Günter and H. Bunke (2003) Ensembles of classifiers for handwritten word recognition, International Journal on Document Analysis and Recognition, 5:224–232.CrossRef44.M. Mori, A. Suzuki, A. Siho and S. Ohtsuka (2000) Generating new samples from handwritten numerals based on point correspondence, 7th International Workshop on Frontiers of Script Recognition, 281–290.45.M. Helmers and H. Bunke (2003) Generation and use of the synthetic training data in cursive script recognition, First Iberian Conf. on Pattern Recognition and Image Analysis, 336–345.CrossRef46.T. Varga, and H. Bunke (2003) Generation of Synthetic Training Data for an HMM-based Script Recognition System, Proceedings of the 7th International Conference on Document Analysis and Recognition, Edinburgh, Scotland, 618–622. About this Article Title Semantic analysis based forms information retrieval and classification Journal 3D Research 4:4 Online DateAugust 2013 DOI 10.1007/3DRes.03(2013)4 Online ISSN 2092-6731 Publisher Springer Berlin Heidelberg Additional Links Register for Journal Updates Editorial Board About This Journal Manuscript Submission Topics Signal, Image and Speech Processing Computer Imaging, Vision, Pattern Recognition and Graphics Optics, Optoelectronics, Plasmonics and Optical Devices Keywords Form processing Data interpretations Preprocessing Validation strategies Semantic analysis Post-processing Industry Sectors Electronics Engineering IT & Software Telecommunications Authors Tanzila Saba (1) Fatimah Ayidh Alqahtani (1) Author Affiliations 1. College of Engineering and Computer Sciences, Salman bin Abdul Aziz University Alkharj KSA, Alkharj, Saudi Arabia Continue reading... To view the rest of this content please follow the download PDF link above.
机译:所有类型的企业都使用数据输入表来每天收集数百个客户的信息。该信息由客户手动填写。因此,使用人工操作员将这些客户信息手动传输到计算机上既费力又费时。此外,它很昂贵,人为错误可能会导致严重的缺陷。从速度和准确性的角度来看,对扫描表格的自动解释已为许多实际应用提供了便利,例如关键字查找,邮政地址排序,脚本匹配和作者识别。这项研究涉及从这些扫描表格,解释和分类中提取客户信息的不同策略。相应地,提取的信息被细分为字符以进行分类,最后以记录的形式存储在数据库中以进行进一步处理。本文对这些基于语义的表单处理分析策略进行了详细讨论。最后,还建议了新的方向以用于将来的研究。关键字表单处理数据解释预处理验证策略语义分析后处理Page%P关闭纯文本查找内部参考工具导出引用EndNote(.ENW)JabRef(.BIB)Mendeley(.BIB)论文( .RIS)Zotero(.RIS)BibTeX(.BIB)添加到论文其他操作注册期刊更新关于本期刊转载和许可分享在Facebook上分享此内容在Twitter上分享此内容在LinkedIn上分享此内容相关内容补充材料(0 )参考文献(46)参考文献1.T. Saba,S Alzorani,A。Rehman(2012)离线临床指导和治疗专家系统,生命科学杂志,9(4):2639–2658.2.A。 Rehman,F。Kurniawan和T. Saba(2011)一种无需字符粉碎的自动线检测和去除方法,《影像科学》,59:171–182.3A。 Vinciarelli(2002)对脱机草书单词识别的调查,模式识别,35(7),1433-1446.MATHCrossRef4.F。 Kurniawan,A。Rehman和D.Mohamad(2009)轮廓与基于非轮廓的手写文本行词分割。实验分析,国际数字内容技术杂志及其应用,3(2):127–131.CrossRef5.B。 Gatos,A。Antonacopoulos,N。Stamatopoulos(2007)ICDAR 2007脚本分段上下文,国际文件分析与识别会议论文集,1284-128.8.6.K。 M Sayre(1973)手写单词的机器识别:项目报告,模式识别,5:213–228.CrossRef7.F。库尼亚万,M.S.M. Rahim,D。Daman,A。Rehman,D。Mohamad和S. Mariyam(2011)手写单词中基于区域的触摸字符分割,《国际创新计算,信息与控制杂志》,7(6):3107–3120.8.A 。 Rehman和D. Mohamad(2008)与神经网络相结合的无约束草书手写单词的简单分割方法,国际图像处理杂志,2(3):29–35.9.A.E.。 Rad,M.S.M. Rahim,A。Rehman,A。Altameem和T. Saba(2013年),《计算机辅助应用中当前牙科X线照片分割方法的评估》,IETE技术评论,30(3):210–222.CrossRef10.Y。 H. Tay(2002)使用人工神经网络和隐藏Morkov模型的离线脚本识别,博士学位论文,马来西亚Teknologi大学,电机工程学院,78.11.G。苏隆(T. Saba),以及A. Rehman(2010)的基于动态编程的混合策略,用于离线草书识别。 IEEE第二届计算机与工程国际会议,2:580–584.12。MSM Rahim,A。Rehman,N。Sholihah,F。Kurniawan和T. Saba(2012)耳生物识别技术中基于区域的特征提取,国际学术研究杂志,4(1):37–42.13.W。 Senior和A. J. Robinson(2002),一种离线草书识别系统,IEEE Transactions on Pattern Analysis and Machine Intelligence,20(3):309-321.CrossRef14.G。 L. Martin和M. Rashid(1991)通过中心对象集成分割和识别来识别重叠的手印字符,技术报告,微电子学和计算机技术公司。15.L。 Rabiner(1989)关于语音识别中的隐马尔可夫模型及其应用的教程,IEEE,77(2):257–286.CrossRef16.S。 Procter,A. J. Elms(1998)使用隐马尔可夫模型识别长度未知的手写数字字符串,第十四届国际模式识别会议论文集(ICPR'98),1515-1517。CrossRef17.A。 J. Elms,S. Procter,J. Illingworth(1989)使用基于HMM的方法进行传真字识别的优势,国际文献分析与识别杂志,18–36.18.M。 Zimmermann和H.Bunke(2002)脚本识别系统的隐马尔可夫模型长度优化,脚本识别前沿国际研讨会,湖中尼亚加拉,369–374.19.Jr。 A. Britto,R。Sabourin,F。Bortolozzi和CY Suen(2001)基于HMM的两阶段系统,用于识别手写数字字符串,国际文档分析与识别会议论文集,美国西雅图,396–400.20小A. Britto,R。Sabourin,F。Bortolozzi,C。Y. Suen(2002)字符串长度预测器,用于控制用于手写数字识别的HMM的级别构建,第16届国际模式识别会议论文集,第4卷:31–34.21.E。 Kavallieratou,E。Stamatatos,N。Fakotakis和G. Kokkinakis(2000)使用基于变换的学习进行手写字符分割,第15届模式识别国际会议论文集,第2期:634–637.22.P。 R. Cavalin,A。S. Britto,F。Bortolozzi,R。Sabourin和L.S. Oliveira(2006)一种基于隐式分割的手写字符字符串识别方法,ACM应用计算研讨会论文集,836–840.23.T。 Hamamura,T. Akagi和B. Irie(2007)一种使用后验概率的分析性单词识别算法,国际文档分析与识别会议论文集,02:669–673.24.F。 Bortolozzi,A。Souza,Jr。A. Britto,Luiz S. Oliveira和M. Morita(2005)脚本识别的最新进展,文档分析,编辑:Umapada Pal,Swapan K. Parui,Bidyut B. Chaudhuri,1-30.3 .R。 J. Pinales,R。Jaime-Rivas,M。J. Castro(2011)脚本识别中感知特征的判别能力,《电信与无线电工程》,64(11),931–937.CrossRef26.O。 D. Trier,A。K. Jain和T. Taxt(1996)字符识别的特征提取方法-调查,模式识别,29(4):641-662CrossRef27.X。 Wang,X. Ding和C. Liu(2005)基于Gabor滤波器的特征提取用于字符识别,模式识别,38(3),369-379.MathSciNetMATHCrossRef28.A。 Rehman和T. Saba(2011)基于基准数据库的草书手写单词识别分割方法的性能分析,数字信号处理,21(3):486-490.CrossRef29.M。 Harouni,MSM,Rahim,D。Mohamad,A。Rehman和T. Saba(2012)通过检测关键点在线草书波斯/阿拉伯字符识别,国际学术研究杂志,4(2):209–21430.M。 Blumenstein,X. Y. Liu和B. Verma(2007)对草书字符识别,模式识别的修改方向特征的研究,40:376-388.MATHCrossRef31.F。 Camastra和A. Vinciarelli(2003)结合神经网络气体和学习矢量量化进行草书字符识别,Neuro-computing,51:147-15932.M。 Cheriet,N。Kharma,C-Lin。刘春雨Suen(2007)字符识别系统(OCR),Wiley,204–206.33.S。 Günter和H. Bunke(2004),用于生成多个分类系统的特征选择算法及其在手写单词识别中的应用,Pattern Recognition Letters,25(11):1323-1336.CrossRef34.M。 Ghosh,R。Ghosh和B. Verma(2004)结合了基于规则的神经网络验证的分段和混合神经网络分类器的全自动脱机脚本识别系统,国际模式识别与人工智能杂志,18(7):1267–1283 .CrossRef35.CL。 Liu和H. Fujisawa(2005)字符识别的分类和学习:方法和遗留问题的比较,神经网络和文档分析与识别学习国际研讨会论文集,5–7.36.A。 A. Aburas和S. A. Rehiel(2008)基于JPEG2000图像压缩的阿拉伯手写字符识别的新的脱机工具,第三届国际介绍和通信技术会议论文集。从理论到应用。 (ICTTA,08),1-5.37。 A. Britto,R。Sabourin,F。Bortolozzi和CY Suen(2004)基于HMM的孤立字符和数字字符串识别的前景和背景信息,第9届国际脚本识别前沿研讨会论文集,371-376.38 .S。 Günter和H.Bunke(2005)使用多个分类器系统的离线草书识别。关于词汇量,合奏和训练集大小的影响,《光学与激光工程》,43(3-5):437-454.CrossRef39.M。 -P。 Schambach(2005),具有非常大的词汇量的快速脚本单词识别,第八届国际文档分析与识别会议论文集,9-13.40.B。 Gatos,I。Pratikakis,A。L. Kesidis和S.J. Perantonis(2008)高效的离线草书脚本单词识别。第十届脚本识别前沿国际研讨会论文集。41.A. Rehman,S.Alqahtani,A.Altamem和T.Saba(2013)虚拟机安全挑战:案例研究,国际机器学习与网络学报,DOI:10.1007 / s13042-013-0166-4.42.A。 Rehman和T.Saba(2011)。文档歪斜估计和校正:技术分析,常见问题和可能的解决方案,应用人工智能,25(9):769-787.CrossRef43.S。 Günter和H.Bunke(2003),用于手写单词识别的分类器集合,《国际文档分析和识别杂志》,5:224–232.CrossRef44.M。 Mori,A。Suzuki,A。Siho和S.Ohtsuka(2000年)基于点对应关系从手写数字生成新样本,第7届国际脚本识别前沿研讨会,281-290.45.M。 Helmers和H.Bunke(2003)草书体识别中综合训练数据的生成和使用,第一伊比利亚会议。关于模式识别和图像分析的文章,336–345.CrossRef46.T。 Varga和H.Bunke(2003)为基于HMM的脚本识别系统生成综合训练数据,第七届文档分析与识别国际会议论文集,苏格兰爱丁堡,618-622。关于本文标题基于语义分析的表单信息检索和分类期刊3D研究4:4在线日期2013年8月DOI 10.1007 / 3DRes.03(2013)4联机ISSN 2092-6731出版商Springer Berlin Heidelberg附加链接注册期刊更新编辑委员会关于此期刊稿件投稿主题信号,图像和语音处理计算机成像,视觉,模式识别和图形光学,光电子学,等离子和光学设备关键字表格处理数据解释预处理验证策略语义分析后处理工业部门电子工程IT和软件电信作者Tanzila Saba(1)Fatimah Ayidh Alqahtani(1)作者所属1.工程与计算机学院科学,萨尔曼·本·阿卜杜勒·阿齐兹大学Alkharj KSA,沙特阿拉伯Alkharj继续阅读...要查看此内容的其余部分,请请点击上方的下载PDF链接。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号