<

Scanning in DjVu

Published on
8,620 Points
2,620 Views
Last Modified:
Approved
Have you heard of “Deja vu”.? As i understood in French this means something like “familiar” or “already experienced”. This is used to explain the weird feeling that most of us have experienced, where we come across a new situation or a person and we feel like it has happened before, although we cannot recall the exact situation.  There could be several religious interpretations on this, but as I know there is no accepted scientific explanation on this yet. (at least I couldn't find any).

I don't know why they have used the same name, but DjVu is a file format similar to PDF, which is significantly small in size. This has been developed by AT&T and later the commercial rights have been transferred to lizard tech. Last year again it was transferred to Celartem Technology, the parent company of Lizard Tech. However DjVu is a free file format which means the specifications and the reference libraries are freely available. Similar to PDF, any user can view a DjVu document by installing a browser plug-in which is available freely. The commercial ownership is only on the encoding technology.

Below are some interesting comparisons from DjVu.org. (I am yet to test these in practice)

Scanned pages at 300 DPI in full color can be compressed down to 30 to 100KB files from 25MB.
Black-and-white pages at 300 DPI typically occupy 5 to 30KB when compressed
For color document images that contain both text and pictures, DjVu files are typically 5 to 10 times smaller than JPEG at similar quality.
For black-and-white pages, DjVu files are typically 10 to 20 times smaller than JPEG and five times smaller than GIF.
DjVu files are also about 3 to 8 times smaller than black and white PDF files produced from scanned documents
There are several important technologies being used in DjVu that makes it possible to have very clear images in such small file sizes. First is the compression technology that is being used. Unlike other compressions, in DjVu a file is compressed as 3 images namely the foreground image, background image and the mask image. The mask image which is in high resolution is used to store the text layer and uses a special compression technique. It compresses a particular character only once. And instead of recording all other occurrences of the same character it records only the location of subsequent occurrences. The other two image layers are stored in colour in low resolution. Due to this high compression technology a DjVu file with lot of text is significantly lower in size than a similar file in PDF. Also the decompression of a DjVu file is done in several steps. So the user will have an initial view very quickly and after few moments only the full quality image is displayed.

These features make DjVu an ideal format for scanning colour text documents for electronic distribution. Who knows, DjVu may even replace PDF files  especially when it comes to scanned colour documents such as text books. The famous million book collection is an example of using DJVU format extensively. They offer more than 1. 5 million full text books freely in the open formats such as HTML, TIFF and DJVU.


Some other useful links;


0
Comment
0 Comments

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Join & Write a Comment

This video is the first in a two-part series that discusses PaperPort's "Send To Bar" feature . This first video tutorial explains the purpose of the Send To Bar, how to use it, and how to hide unwanted items that are automatically created on it whe…
Sometimes we receive PDF files that are in the wrong orientation. They may be sideways or even upside down. This most commonly happens with scanned or faxed documents. It is possible to rotate the view of these PDFs with the free Adobe Reader produc…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month