Manipulating PDF files

There are a variety of tools available for viewing and manipulating PDF files.

A custom tool provided by the department that can be used to split large CoreHR recruitment PDF files into a file per candidate is available at https://www.maths.ox.ac.uk/tools/pdf_split

On the Linux Desktops

Viewing

  • evince - also has an annotate feature
  • okular - also has a tools -> review feature for annotation, or a simpler interface for annotation when in full screen presentaiton mode
  • xpdf
  • or even a browser such as google-chrome or firefox

Manipulating

  • pdftk - split/merge documents, reorder/rotate pages, repair, view/modify metadata etc [no longer available]
  • qpdf - split/merge documents, reorder/rotate pages, repair, view/modify metadata etc
  • pdfsuffler - GUI option for page rotation, reorder etc.
  • pdfjoin - simple joining tool
  • pdfnup - reformat to N pages per physical page
  • pdf90 - simple tool to rotate document 90 degrees
  • pdfimages - extract image
  • pdftotext - extract text

On the Windows Desktops

Viewing

  • Acrobat Reader
  • Nuance PDF Professional
  • PDFXchangeViewer

Manipulating

The recommended tool for all PDF manipulations is Nuance PDF Converter Enterprise which is licensed and recommended by the university. It is installed on all desktops in the department.

  • Nuance PDF Converter Enterprise - very flexible took for annotating, splitting, creating, reordering pages etc
  • PDF Creator - produce PDF version of documents, join/split documents
  • PDFXchangeViewer - as well as viewing it can perform various modifications too

In particular note in Nuance to reorder pages etc look in the Document menu under the pages option whereas to join PDFs select to create New PDF from the file menu and choose the combine multiple PDFs option.

Viewing and modifying PDF metadata

PDF documents will typically contain metadata such as the creator application, author, creation date etc. The exact data included will depend on the application that was used to create the PDF file and any metadata of the original document if not starting from a new document, e.g. the author data may already be set and remain unchanged when the PDF is created.

One can view (some of) the metadata in a PDF with various tools, e.g. in the acrobat reader look at the document properties. Various tools exist that will allow you to edit (some of) the metadata.

There are times when it may be important to check the metadata in a file and potentially remove it, e.g. when producing a referees report you would not want any data that may relate to the referee to remain in the metadata. If the PDF has been created from a Word document it is quite likely that by default it will contain the authors name in the metadata. Other applications such as LaTeX are less likely to result in such information being included.

One tool that can list all the metadata in the PDF file is pdftk (available for Linux, Windows and Mac).

The command

pdftk myfile.pdf dump_data

will give you a listing of all the metadata in a PDF file.

One can modify this output and feed it back to pdftk to wipe all the metadata in the file, e.g. on a Linux or Mac system one could run

pdftk myfile.pdf dump_data | sed -e 's/^InfoValue:.*/InfoValue:/g' | pdftk myfile.pdf update_info - output myfile-clean.pdf

will set all the values to empty and write them back to a new PDF file.

A free graphical tool for viewing and clearing metadata from a PDF file under windows is BeCyPDFMetaEdit. This tool is installed on all departmental Windows desktops. Simply open the relevant PDF file with BeCyPDFMetaEdit making sure you use the complete rewrite mode (as opposed to the default incremental mode where changes can be undone), click on the clear all metadata button, and then save the file.

Creating a fillable PDF form

Whilst you can create a form in MS Word and others can fill it in, you may find the form displays differently for different people in different versions or when they use other document editors such as Open Office or Abiword. You can of course test your form in other applications in an effort to make it as robust as possible.

Another option is to turn the form into a fillable PDF document. A PDF form locks down the formatting and just gives the person boxes on the page to work in etc and then print the filled in form (or save it).

There are some details of how to do this with the Nuance Converter Professional PDF software (that is installed on the departmental Windows systems) at http://community.nuance.com/wikis/pdf6/creating-pdf-forms.aspx and the related links in the right panel of that page.

The basic process is:

  • Create the form in MS Word and the output a PDF versions, e.g. from the Nuance PDF menu in MS Word.
  • opened the PDF in Nuance.
  • From the top menus select View -> Toolbars - Form Tools so you see the form tools.
  • From the form tools toolbar click the Form Typer button. This tries to autodetect areas of the page and turn them into PDF form elements.
  • Clicked on the highlight form fields button so you can more clearly see what it has identified.
  • Use the buttons on the left of the forms toolbar to select the missed text areas as form elements.
  • Correct any other form areas, e.g. grab the corners of any elements that are not in the right place and resized them.
  • Save the file under a new name and you now have the fillable form version.

People can fill in the form created using the free acrobat reader or other free PDF tools.

When someone uses the form in the free adobe reader they cannot save the completed form. However, they can print it to file as a PDF and hence save the finished version electronically.

Please contact us with feedback and comments about this page. Last updated on 23 Apr 2022 12:56.