首页> 外文会议>2014 5th International Conference- Confluence The Next Generation Information Technology Summit >Script identification and language detection of 12 Indian languages using DWT and template matching of Frequently Occurring Character(s)
【24h】

Script identification and language detection of 12 Indian languages using DWT and template matching of Frequently Occurring Character(s)

机译:使用DWT和频繁出现的字符的模板匹配,对12种印度语言进行脚本识别和语言检测

获取原文
获取原文并翻译 | 示例

摘要

India is a diverse country with various cultural and traditional differences. There are more than 12 distinguished different languages in the country, viz., Hindi, Bangla, Marathi, Oriya, Tamil, Telugu, Assamese, Manipuri, Gujarati, Kannada, Malayalam, Panjabi, Nepali, Tibetan, Urdu, etc. Optical Character Recognition (OCR) of Indian Languages needs to be designed in such a way that it automatically identifies the language of the input document for further processing. There are many techniques which are already implemented, but the problem lies in identifying and detecting the correct language as some of the languages uses the same or similar script. For example, the Bangla script is used to write - Bengali, Assamese and Manipuri languages. Though the scripts can be distinguished using global technique, the problem with languages having similar script still exist. To deal with this problem, a robust wavelet transform cumtemplate-matching invariant to rotation, scale and translation technique is deployed to identify the script and detect the language of the document automatically.
机译:印度是一个多元的国家,具有各种文化和传统差异。该国有超过12种杰出的不同语言,即印地语,孟加拉语,马拉地语,奥里亚语,泰米尔语,泰卢固语,阿萨姆语,曼尼普里语,古吉拉特语,卡纳达语,马拉雅拉姆语,潘加比语,尼泊尔语,藏语,乌尔都语等。光学字符识别印度语(OCR)的设计方式应为自动识别输入文档的语言以进行进一步处理。已经实现了许多技术,但是问题在于识别和检测正确的语言,因为某些语言使用相同或相似的脚本。例如,孟加拉语脚本用于编写-孟加拉语,阿萨姆语和曼尼普里语。尽管可以使用全局技术来区分脚本,但是具有相似脚本的语言仍然存在问题。为了解决这个问题,部署了鲁棒的小波变换和旋转,缩放和平移技术不变的模板匹配匹配技术,以识别脚本并自动检测文档的语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号