Working with PDF Files on Linux

PDF (Portable Document Format) is a popular file format created by Adobe in the early 1990s. Even before it was released as an open standard in 2008, PDF was widely used for document exchange because of its universal compatibility. In other words, if you want your documents to look consistent everywhere, you should use PDF. It supports font embedding, which means that other users will see the text just like you formatted it, even if they don’t have the same fonts as yours.

Note that some fonts are not embeddable, so they are usually automatically replaced by standard system fonts.PDF files can contain interactive elements – form fields, annotations, even 3D objects – and can be digitally signed or encrypted. Adobe has developed nine versions (specifications) of PDF, and each new version is backward-inclusive, meaning that it supports all features added in previous versions. The safest practice when creating PDF files is to use the latest stable version of PDF (currently 1.7).

Creating PDF Files

The easiest way to create a PDF document is to use the “Export to PDF” function available in OpenOffice and LibreOffice. All applications in the office suite support this, so apart from text documents, you can export spreadsheets, graphics, forms and presentations. “Export to PDF” can be accessed either from the File menu or from the Standard toolbar.

The dialog box offers five tabs (General, Initial View, User Interface, Links and Security) where you can tweak the appearance of your PDF file. Customization options abound – you can adjust how the document will be displayed (as a single page, full-screen, without toolbars…) as well as which elements to export (images, comments, links…). You can also password-protect your PDF file, restrict printing and copying, and much more.

You can extend this functionality to any application that supports printing by installing a virtual printer. For this you’ll need a software package called cups-pdf. The installation procedure largely depends on the Linux distribution, but the package should be available in the official repositories, so you can easily find it in your package manager. Additional setup is not necessary on Ubuntu and its spin-offs, as the virtual printer is automatically recognized and ready to use. You can configure advanced options by selecting the PDF printer in System > Administration > Printing.

Other distributions might require that you manually add the printer via the Printer configuration dialog, which is usually found under System Settings or Administration Tools. Choose “Generic” for both the printer make and the driver, and enter cups-pdf:/ as the device URI. If everything went well, you will now see a “Print to PDF” or “Print to file” option in every application when you try to print files. This is useful for printing web pages, because you don’t have to install special browser add-ons, and it basically counts as HTML to PDF conversion.

Converting PDF Files

File conversion is best when it works both ways. This is the case on Linux, where you can convert PDF files to other file formats and vice versa. Most commonly, users want to convert scanned images to PDF or split a PDF file into images. The most powerful tool for this job is ImageMagick. It’s a command-line tool, but don’t let this scare you because it’s really easy to use. For example, to convert a PDF file to multiple images, write the following command into your favorite console application:

convert your-file.pdf image.jpg

Simple as that. The result is X JPG images, where X is the number of pages in the original PDF file. Make sure to execute this command in the folder containing the PDF file.
Similarly, if you wish to join images into a PDF file, just write:

convert *.png your-document.pdf

The asterisk means that all files with .png extension in that folder will be converted. If you want to convert only a few images, enter their full filenames separated by spaces.
ImageMagick supports more than 100 image formats and has almost as many options. You can learn all about them in the man page (type man imagemagick in the console) or in the official online documentation.If you want to extract text from a PDF document, there’s a small command-line utility called pdftotext. It comes in the poppler-utils package and has a really simple syntax:

pdftotext your-document.pdf textfile.txt

It can convert protected and encrypted files (provided that you know the password), and you can specify which pages to convert. Note that you won’t be able to convert PDF files which are made from images, as pdftotext does not have OCR capabilities. Luckily, there are many OCR tools to choose from. Some are free, others cost a fortune. Personally, I recommend OnlineOCR because it supports multiple languages and in most cases produces excellent results.You’re allergic to command-line? Don’t panic – ImageMagick has a nice graphical frontend called Converseen.

The interface is straightforward, so you shouldn’t have trouble using Converseen. The installation process is described on the project’s webpage.Another simple graphical tool worth mentioning is gscan2pdf. Though it might appear ascetic, it boasts plenty of options for scanning, file conversion and creating PDF files from scanned documents. The most remarkable is the OCR capability, meaning that gscan2pdf can extract text from documents scanned as images.

Viewing and Editing PDF Files

Every (major) Linux distribution comes with a default PDF viewer. The most popular are Evince and Okular. Both support many file formats and share standard features such as side-by-side display, thumbnails, bookmarks and text selection. However, Evince lacks some advanced options and is therefore considered lightweight. Okular supports annotation, highlighting, commenting and even extracting text from PDF to a separate file. The downside might be a bunch of KDE dependencies which you’ll have to install if you don’t have KDE.

Windows-nostalgic users needn’t worry, because both Adobe and Foxit offer Linux versions of their popular PDF readers. Sadly, they are stripped of some useful features, such as commenting and form filling, so it’s much better to use native Linux applications.

Finally, the most challenging part – editing PDF files. Today it’s much easier to edit PDF files than before thanks to a host of available tools. Some of them are single-purpose, while others sport many different functions.

PDFtk (PDF toolkit) is one of the latter – it’s a command-line tool which can rearrange, add, rotate or delete pages from a PDF document. You can also encrypt, decrypt, join and even repair corrupted PDF files. PDFtk also runs on Windows and Mac OS X, and it’s available in the repositories of many Linux distributions. After you’ve installed it, you can access its full documentation and usage examples by typing man pdftk into the console.

PDF Shuffler shares some functionality with pdftk – cropping, rotating, rearranging, exporting and removing pages – plus it has a clean graphical interface. You import (open) a PDF file, make desired changes and export (save) it. The only problem is that you’ll have to open multiple instances of PDF Shuffler if you want to edit more PDF files at the same time.

Similar tools are jPDF Tweak, which supports adding watermarks, bookmarks and page transitions, and PDF Split and Merge which is simple and lightweight. Another great tool is PDF Mod, and the feature that sets it apart from the rest is the ability to edit PDF metadata (title, author, keywords). True, pdftk can also do this, but if you prefer GUI applications, PDF Mod is perfect for you.

jpdfbookmarks does what its name says – it’s a handy utility for adding and editing bookmarks to your PDF files. Unfortunately, it doesn’t provide document preview, so you’ll have to open your PDF in some other application alongside jpdfbookmarks.

When you have to make substantial changes to a PDF file, LibreOffice Draw or Xournal are probably the best choice. Draw uses layers and treats pages as images, but it will let you add and edit both objects and text. Xournal is primarily a note-taking application, but it allows you to import PDF files and annotate them; highlight, add or erase parts of text, and then export the modified file to PDF.  If you’re looking for advanced, feature-rich PDF editing applications, try PDF Studio, PDFedit and Master PDF Editor, or stay tuned for future articles and advice on their usage.






Leave a Reply

Your email address will not be published. Required fields are marked *