Search TekSocial
Stay Connected

Enter your email address:

(We respect your privacy!)

Or subscribe with your favorite RSS Reader


« Temple Run 2: a Free iPhone Game that Rocks! | Main | Jimmyr Review: Technology and Programming News from the Whole Web »

Working with PDF Files on Linux

pdfPDF (Portable Document Format) is a popular file format created by Adobe in the early 1990s. Even before it was released as an open standard in 2008, PDF was widely used for document exchange because of its universal compatibility. In other words, if you want your documents to look consistent everywhere, you should use PDF. It supports font embedding, which means that other users will see the text just like you formatted it, even if they don't have the same fonts as yours.

Note that some fonts are not embeddable, so they are usually automatically replaced by standard system fonts.PDF files can contain interactive elements – form fields, annotations, even 3D objects – and can be digitally signed or encrypted. Adobe has developed nine versions (specifications) of PDF, and each new version is backward-inclusive, meaning that it supports all features added in previous versions. The safest practice when creating PDF files is to use the latest stable version of PDF (currently 1.7).

Creating PDF Files

The easiest way to create a PDF document is to use the “Export to PDF” function available in OpenOffice and LibreOffice. All applications in the office suite support this, so apart from text documents, you can export spreadsheets, graphics, forms and presentations. “Export to PDF” can be accessed either from the File menu or from the Standard toolbar.

"Export to PDF" dialog box in LibreOffice

The dialog box offers five tabs (General, Initial View, User Interface, Links and Security) where you can tweak the appearance of your PDF file. Customization options abound – you can adjust how the document will be displayed (as a single page, full-screen, without toolbars...) as well as which elements to export (images, comments, links...). You can also password-protect your PDF file, restrict printing and copying, and much more.

You can extend this functionality to any application that supports printing by installing a virtual printer. For this you'll need a software package called cups-pdf. The installation procedure largely depends on the Linux distribution, but the package should be available in the official repositories, so you can easily find it in your package manager. Additional setup is not necessary on Ubuntu and its spin-offs, as the virtual printer is automatically recognized and ready to use. You can configure advanced options by selecting the PDF printer in System > Administration > Printing.

Other distributions might require that you manually add the printer via the Printer configuration dialog, which is usually found under System Settings or Administration Tools. Choose “Generic” for both the printer make and the driver, and enter cups-pdf:/ as the device URI. If everything went well, you will now see a “Print to PDF” or “Print to file” option in every application when you try to print files. This is useful for printing web pages, because you don't have to install special browser add-ons, and it basically counts as HTML to PDF conversion.


Export web pages to PDF in Firefox

Converting PDF Files

File conversion is best when it works both ways. This is the case on Linux, where you can convert PDF files to other file formats and vice versa. Most commonly, users want to convert scanned images to PDF or split a PDF file into images. The most powerful tool for this job is ImageMagick. It's a command-line tool, but don't let this scare you because it's really easy to use. For example, to convert a PDF file to multiple images, write the following command into your favorite console application:

convert your-file.pdf image.jpg

Simple as that. The result is X JPG images, where X is the number of pages in the original PDF file. Make sure to execute this command in the folder containing the PDF file.
Similarly, if you wish to join images into a PDF file, just write:

convert *.png your-document.pdf

The asterisk means that all files with .png extension in that folder will be converted. If you want to convert only a few images, enter their full filenames separated by spaces.
ImageMagick supports more than 100 image formats and has almost as many options. You can learn all about them in the man page (type man imagemagick in the console) or in the official online documentation.If you want to extract text from a PDF document, there's a small command-line utility called pdftotext. It comes in the poppler-utils package and has a really simple syntax:

pdftotext your-document.pdf textfile.txt

It can convert protected and encrypted files (provided that you know the password), and you can specify which pages to convert. Note that you won't be able to convert PDF files which are made from images, as pdftotext does not have OCR capabilities. Luckily, there are many OCR tools to choose from. Some are free, others cost a fortune. Personally, I recommend OnlineOCR because it supports multiple languages and in most cases produces excellent results.You're allergic to command-line? Don't panic - ImageMagick has a nice graphical frontend called Converseen.

Converseen interface

The interface is straightforward, so you shouldn't have trouble using Converseen. The installation process is described on the project's webpage.Another simple graphical tool worth mentioning is gscan2pdf. Though it might appear ascetic, it boasts plenty of options for scanning, file conversion and creating PDF files from scanned documents. The most remarkable is the OCR capability, meaning that gscan2pdf can extract text from documents scanned as images.

gscan2pdf interface

Viewing and Editing PDF Files

Every (major) Linux distribution comes with a default PDF viewer. The most popular are Evince and Okular. Both support many file formats and share standard features such as side-by-side display, thumbnails, bookmarks and text selection. However, Evince lacks some advanced options and is therefore considered lightweight. Okular supports annotation, highlighting, commenting and even extracting text from PDF to a separate file. The downside might be a bunch of KDE dependencies which you'll have to install if you don't have KDE.

Okular annotation
Adding notes in Okular

Windows-nostalgic users needn't worry, because both Adobe and Foxit offer Linux versions of their popular PDF readers. Sadly, they are stripped of some useful features, such as commenting and form filling, so it's much better to use native Linux applications.

Finally, the most challenging part – editing PDF files. Today it's much easier to edit PDF files than before thanks to a host of available tools. Some of them are single-purpose, while others sport many different functions.

PDFtk (PDF toolkit) is one of the latter – it's a command-line tool which can rearrange, add, rotate or delete pages from a PDF document. You can also encrypt, decrypt, join and even repair corrupted PDF files. PDFtk also runs on Windows and Mac OS X, and it's available in the repositories of many Linux distributions. After you've installed it, you can access its full documentation and usage examples by typing man pdftk into the console.

PDF Shuffler shares some functionality with pdftk – cropping, rotating, rearranging, exporting and removing pages – plus it has a clean graphical interface. You import (open) a PDF file, make desired changes and export (save) it. The only problem is that you'll have to open multiple instances of PDF Shuffler if you want to edit more PDF files at the same time.

PDF Shuffler
PDF Shuffler interface

Similar tools are jPDF Tweak, which supports adding watermarks, bookmarks and page transitions, and PDF Split and Merge which is simple and lightweight. Another great tool is PDF Mod, and the feature that sets it apart from the rest is the ability to edit PDF metadata (title, author, keywords). True, pdftk can also do this, but if you prefer GUI applications, PDF Mod is perfect for you.

jpdfbookmarks does what its name says – it's a handy utility for adding and editing bookmarks to your PDF files. Unfortunately, it doesn't provide document preview, so you'll have to open your PDF in some other application alongside jpdfbookmarks.

When you have to make substantial changes to a PDF file, LibreOffice Draw or Xournal are probably the best choice. Draw uses layers and treats pages as images, but it will let you add and edit both objects and text. Xournal is primarily a note-taking application, but it allows you to import PDF files and annotate them; highlight, add or erase parts of text, and then export the modified file to PDF.  If you're looking for advanced, feature-rich PDF editing applications, try PDF Studio, PDFedit and Master PDF Editor, or stay tuned for future articles and advice on their usage.




Ivana Isadora Devcic is a freelance writer, copyeditor and translator fluent in English, Swedish, Croatian and Norwegian. She's a Linux user and KDE fan interested in web design, productivity and personal branding. Ivana tweets about the world around her as @skadinna.

Reader Comments (1)

Thanks Ivana for your great review. Although PDF is one of the standards with Linux, it also has been a pain for ages, especially for people that contemplate/ have just taken the plunge and switched. If you do, Acrobat (Pro that is) seems a hard one to replace. You are right that nowadays most of the functionalities can be done with Linux. But as with more things it takes you 4 different programs and takes you 10 times longer. Your excellent review hits the nail right on the head in more than one ways: perhaps it would be good when Linux finally gets its all-in-one PDF reader/ editor that makes it really easy to do all these things with one program (e.g. highlight, comment, insert/ rotate/ delete pages, bookmarks, secure) and becomes really functional.

October 13, 2013 | Unregistered CommenterGina

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>