Counting the number of colour pages in a PDF

2 minute read

Published:

This is just a little aside from the audio focus of this blog. Recently I completed my thesis for my PhD, which involves a lot of costly printing. Many printers charge one rate for black & white pages, and a separate, much higher rate for colour pages. As a result I wanted to anticipate how much one printing of the thesis would cost and so needed to know:

  • The number of pages in the thesis
  • How many of these are colour, and
  • How many of these are black & white

Performing this task on a pdf file was not immediately clear to me. Thankfully this is a well trodden path so the answer was readily available after a bit of searching and compiling, but I am writing it here so that anyone going down the same path doesn’t have to do the same.

The solution presented here uses a shell script, written for bash. This was run in the terminal on macOS High Sierra, but should be portable for bash terminals. The target document is a pdf as generated by LaTeX.

The script requires ghostscript to analyse the pdf page colour, this should be installed anyway with your LaTeX distribution, and uses pdfinfo so won’t run on Windows unfortunately!

Full script

For those who just want the code, here it is in a github gist:

Usage

I wanted to design a shell script which I could call with a single argument, the file name, and receive the total number of pages and the number of colour/black & white pages. To do this I made a file called count_colour_pages.sh in which I wrote the shell script. This file needs to be on your $PATH to ensure you can call it anywhere in your file system.

The output is then three lines in the terminal reporting the desired properties of the pdf document.

An example:

iMac: thesis-latex$ count_colour_pages.sh main.pdf
Number of pages:    213
Number of b&w:      141
Number of colour:   72

where main.pdf is my compiled LaTeX document. Despite the ridiculous length of theses, only around 1/3rd of this one has any colour on the page, which saved me a bunch when printing.