Back to Showcase



Download WDIC here

WDIC---Wavelet-based Document Image Compression

As electronic storage, retrieval and distribution of documents becomes faster and cheaper, documents and papers are becoming increasingly digital. In the last decade existing documents are usually re-typed and converted to HTML or Adobe's PDF format. Recently, a simple alternative would be to scan the original page and compress the image in JPEG or GIF format by standard compression algorithm. Unfortunately, those files tend to be quite large if one wants to preserve the readability of the text. We need to develop an approach for compressing document images that makes it possible to transfer a high-quanlity page at very high compression ratio.

The WDIC document image compression technique described here is designed to overcome all the above problems.

The main idea of our document image compression technique is to partition and encode separately four parts from which the original image can be reconstructed: the character images, the picture images, the line images and the background image. The character image can be encoded with a novel extent-based morphological matching and clustering and wavelet compression algorithm. A picture image can be encoded with a wavelet-based compression algorithm, which is suitable for gray scale images. A line can be encoded with a one-dimension wavelet-based compression algorithm. The background image also can be encoded with a wavelet-based compression algorithm.

With WDIC, a typical A4 size document page in 8-bit gray scale at 100 dpi can be compressed to 5K-30K. When compared with the original 100dpi raw image, WDIC achieves a compression ratio ranging from 40 to 200 times. A typical document page at 300 dpi, WDIC achieves a compression ratio ranging from 100 to 400 times.

WDIC is a progressive codec. It provides progressive decoding not only on background, but also on character images, picture images and line images. Users can choose the compression ratios to obtain a satisfactory image quality interactively. WDIC can also automatically attain the highest compression rate while maintaining high quality for text only without pictures.

A comparison of WDIC with JPEG and SPIHT-like wavelet compression is given below. In our experiment, one page of A4 size document at 100 dpi is compressed. Only the left- upper part of the document is displayed.