Binary Watermarking, robust to printing and scanning, is the process of imperceptibly hiding information in binary documents, typically text documents, so that the hidden information can still be recovered following the printing and scanning of a document. It presents a challenging problem, both in finding an imperceptible way to hide data within a sparse text document, and providing an embedding strategy that can handle the myriad of distortions introduced during printing and scanning. Our goal was to develop a scheme that had sufficient capacity to embed our proposed authenticating and localising watermark. Existing schemes did not provide sufficient capacity, requiring us to develop techniques to increase the embedding capacity whilst maintaining the robustness to printing and scanning. In this thesis we present two distinct approaches to binary watermarking robust to printing and scanning. Our first approach, Binary Text Watermarking, is based on the principle of adjusting white space between adjacent words in order to embed a watermark, and forms the main focus of this thesis. A fundamental requirement of Binary Text Watermarking is the correct classification of white space, which is a difficult task due to the variation between different fonts, font sizes and the small tolerances between success and failure. The task is compounded by the requirement to classify white space in the same way, following printing and scanning, and even photocopying. The techniques we have proposed: Frequency Thresholding, Frequency Shaping and PDF Analysis have been designed to cope with distortion and mitigate against the possibility of misclassification. We have analysed 864 test documents to validate our white space classification techniques. In doing so we have discovered a number of interesting characteristics associated with the distribution of white space within documents. In order to increase available embedding capacity within a document we have proposed a new set allocation process in the form of Continuous Line Watermarking. In addition we have introduced two new set modulation techniques: Quad Set Watermarking and Ternary Watermarking. We have achieved an overall increase in capacity of 174%, in comparison to previous work, whilst mainting robustness to printing and scanning, and even photocopying. In order to validate this we printed and scanned 504 test documents, consisting of different fonts and variants of our scheme. We have developed an end-user application that provides a fully featured user interface to allow users to analyse, embed, detect and authenticate documents using our watermarking schemes. Our second approach to binary watermarking is our Imperceptible Yellow Printer Dots method, which is orthogonal to our Binary Text Watermarking method. The concept is based on the methods used by printer manufacturers to identify the printer on which the document was printed. The method relies on the imperfections of the human visual system, which prevent humans seeing tiny yellow dots on white paper. The novelty of our approach is the automatic embedding and detection of a custom watermark using an off-the-shelf printer and scanner. It must handle similar distortions to those incurred by our Binary Text Watermarking method, whilst still successfully finding the tiny yellow dots. Our proposed, Imperceptible Yellow Printer Dots, scheme provides a capacity over 18 times greater than our Binary Text Watermarking method.
展开▼