首页> 外文会议>Proceedings of the 1st ACM workshop on Hardcopy document processing >A text image enhancement system based on segmentation and classification methods
【24h】

A text image enhancement system based on segmentation and classification methods

机译:基于分割和分类方法的文本图像增强系统

获取原文
获取原文并翻译 | 示例

摘要

I would like to welcome the attendees of the Hardcopy Document Processing (HDP) Workshop and thank them for their interest in this subject. >The stated purpose of the CIKM 2004 Conference "is to identify challenging problems facing the development of future knowledge and information systems." A major (and growing) challenge for industry and government is need to access and process the content of hardcopy documents. The focus of the workshop is to present current research and development in addressing this challenge. During the 1990s, CEOs realized that their companies'' intellectual assets represented the majority of corporate wealth. The discipline of knowledge management was born to leverage these resources. Intellectual assets are primarily contained in hardcopy document collections. Many of these collections were scanned, indexed, OCRed and placed upon corporate intranets to be employed to gain competitive advantage. >Businesses, particularly highly regulatedindustries such as pharmaceutical, environmental, and transportation, generate hardcopy records that must be retrievable to demonstrate compliance. Accurate document retrieval requires sufficient indexing. Unfortunately, sufficient indexing requires a priori knowledge of future, unknown requirements. >Many applications need to retrieve and process hardcopy documents on an on-going basis. In many cases, documents must be exploited in near-real time for their content to be actionable. Further, documents of interest tend to be very noisy and often contain multiple handwritten annotations or other marks. >Currently, the only viable solution is to be able to retrieve and process the content of OCRed documents. In virtually all situations, the cost, in either time or capital, of correcting OCR is prohibitive, and therefore either OCR accuracy must be improved, the ability to process noisy OCR must be improved, or new, innovative techniques must be developed to process text in the image domain. The ability to process hardcopy documents is a challenge of international importance and an appropriate workshop topic for this CIKM Conference. >The focus of the Hardcopy Document Processing Workshop is to bring together the text processing and information retrieval research communities along with the users who face the challenge of processing information from hardcopy documents. The purpose will be to gain a better understanding of the current state-of-the-art and needs of the user community by exchanging of ideas. The target audience will be a mixture of academia and researchers from the user communities.
机译:我想欢迎硬拷贝文档处理(HDP)研讨会的与会者,并感谢他们对这一主题的关注。

CIKM 2004会议的既定目的是“以确定未来知识和信息系统发展所面临的挑战性问题。”对于行业和政府而言,一个主要的(并且正在增长的)挑战是需要访问和处理硬拷贝文档的内容。研讨会的重点是介绍应对这一挑战的最新研究和发展。在1990年代,首席执行官意识到他们公司的知识资产占公司财富的大部分。知识管理的学科诞生于利用这些资源。智力资产主要包含在硬拷贝文档集中。对这些馆藏中的许多馆藏进行了扫描,索引,OCRed并将其放置在公司的内部网上以获取竞争优势。

企业,特别是受到严格管制的行业,例如制药,环境和运输行业,必须生成硬拷贝记录可检索以证明合规。准确的文档检索需要足够的索引。不幸的是,足够的索引编制需要对未来未知需求的先验知识。

许多应用程序需要不断地检索和处理硬拷贝文档。在许多情况下,必须以接近实时的方式利用文档,以便内容可操作。此外,感兴趣的文档往往非常嘈杂,并且经常包含多个手写注释或其他标记。

当前,唯一可行的解​​决方案是能够检索和处理OCRed文档的内容。在几乎所有情况下,校正OCR的时间或资金成本都是高昂的,因此必须提高OCR的准确性,必须提高处理嘈杂的OCR的能力,或者必须开发新的创新技术来处理文本在图像域中。处理硬拷贝文档的能力是国际重要性的挑战,也是本CIKM会议的适当研讨会主题。

硬拷贝文档处理研讨会的重点是将文本处理和信息检索研究社区聚集在一起面临着处理来自硬拷贝文档的信息的挑战的用户。目的是通过交流思想来更好地了解当前的最新技术和用户社区的需求。目标受众将是来自用户社区的学术界和研究人员。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号