Showing posts with label pdf. Show all posts
Showing posts with label pdf. Show all posts

20120228

Slick PDFs from scanned documents

It is convenient to have digital versions of your most important documents. While they will probably not hold up as a replacement should the originals disappear, the ability to produce a copy at any time with a simple print command alone is huge. Also, to be able to attach a "physical" document to an email or a job application is really handy and professional.
Let's have a look at how to generate PDFs from scanned documents, which are:
  • small in file size
  • one file per (potentially multi-page) document
  • fit to generate a good copy of the original
First, start with a high quality scan of the document. Make sure the page is positioned correctly to avoid skew and shading. A resolution of 300dpi is enough in my experience. If the document consists solely of text, use linedrawing mode to reduce the file size. This will save you some work later.
On linux, this command will do that work for you:
scanimage --mode=Lineart --resolution=300 --format=tiff > in.tiff
Second, improve the resulting image. I recommend to crop the image to the page and to adjust the levels to generate a clear white background and crisp black text. If you are using the GIMP for this, choose Color->Levels and use the black and white point tools.
The same number of colors creates a much clearer picture
if the input image contained fewer colors.
Also, consider reducing the colors in parts of the image. Many times documents contain just a small part in any color (like a signature). Reducing the colors of the rest of the document allows the compression to achieve good results with a smaller number of total colors.Choose black and white only for text! Don't worry about aliasing, the resolution will take care of that.
The secret to achieving small file sizes is to reduce the number of colors required to save the image. Change the color mode to indexed (Image->Mode->Indexed... in GIMP) and have it generate a palette with a small number of colors. 16 entries per color should be plenty. Leave out black and white in that calculation if you reduced them to one gray value each. So, for text only documents you should arrive at 1 color, for text with a single additional color 16 is more than enough.
Save the result as a TIFF file (use "save as..." to guarantee you get to choose the compression parameters) and instruct it to use lzw compression. (Note that this file format does not support transparency, so remove the alpha channel from each layer first by right-clicking and choosing "Remove Alpha Channel".)
The resulting image should be far smaller in size than the original. Less than 200KB is realistic for black and white, not much more than 1MB for anything with more colors is feasible.
Next, if your document consists of more than one page, repeat the steps above for each. Then join them together using the command tiffcp:
tiffcp -c lzw page1.tiff page2.tiff ... out.tiff
Of course, replace the pageX.tiff with the filenames you actually chose.
Finally, to turn this TIFF file into a PDF, use tiff2pdf. This should work:
tiff2pdf -z -pa4 -F out.tiff -o out.pdf
You can also specify the author information (your name, presumably) using the -a parameter and the title using -t. So maybe:
tiff2pdf -z -pa4 -F out.tiff -o out.pdf -a "John Doe" -t "Master's Certificate"
And that should be it!
Did I miss anything? Do you have more tricks up your sleeve for how to make a really high quality digital version of your documents? Share them in the comments!

20090513

Okular - KDE4 PDF Viewer (and much more)

One of the exciting new programs introduced in KDE 4.0 is a unified document viewer called Okular.
Based on the solid foundation of KPDF from the KDE3 era, Okular became a document viewer with many more features and support for a large number of formats. At the time I wrote about it in my blog post about KDE4.0.0 it supported 28 file types. In version 0.8.2 (part of KDE 4.2.2) as shipped with Kubuntu Jaunty it supports 45 file types. Note that these numbers may vary greatly with the number of packages supporting various file types installed and that some of the supported types are just compressed versions of others.
Here is what it (a later version: 0.14.1, the screenshots were lost and needed to be re-done) looks like displaying in order: a PDF file, a JPG image, an OpenDocument document and a Comic Book Archive comic.
Additionally Okular supports a presentation mode (not pictured) as well as annotations and reviews:
Okular is slick, powerful and easy to use. It is a great example of the new user interaction ideals seen in all components of KDE 4.