As electronic storage, retrieval and distribution of documents becomes faster and cheaper, documents and papers are becoming increasingly
digital. In the last decade existing documents are usually re-typed and converted to HTML or Adobe's PDF format. Recently, a simple alternative
would be to scan the original page and compress the image in JPEG or GIF format by standard compression algorithm. Unfortunately, those files
tend to be quite large if one wants to preserve the readability of the text. We need to develop an approach for compressing document images
that makes it possible to transfer a high-quanlity page at very high compression ratio.
The WDIC document image compression technique described here is designed to overcome all the above problems.
The main idea of our document image compression technique is to partition and encode separately four parts from which the original image can
be reconstructed: the character images, the picture images, the line images and the background image. The character image can be encoded with
a novel extent-based morphological matching and clustering and wavelet compression algorithm. A picture image can be encoded with a wavelet-based
compression algorithm, which is suitable for gray scale images. A line can be encoded with a one-dimension wavelet-based compression algorithm.
The background image also can be encoded with a wavelet-based compression algorithm.
With WDIC, a typical A4 size document page in 8-bit gray scale at 100 dpi can be compressed to 5K-30K. When compared with the original 100dpi raw image,
WDIC achieves a compression ratio ranging from 40 to 200 times. A typical document page at 300 dpi, WDIC achieves a compression ratio ranging from 100 to 400 times.
WDIC is a progressive codec. It provides progressive decoding not only on background, but also on character images, picture images and line images.
Users can choose the compression ratios to obtain a satisfactory image quality interactively. WDIC can also automatically attain the highest compression
rate while maintaining high quality for text only without pictures.
A comparison of WDIC with JPEG and SPIHT-like wavelet compression is given below. In our experiment, one page of A4 size document at 100 dpi is compressed.
Only the left- upper part of the document is displayed.