Counting the Words in a LaTeX Document

The concept of a word count for a mathematical document is usually not appropriate. A more appropriate assessment is to provide some guidance on the page size, line spacing and font size to be used and then define a limit in terms of pages after excluding certain material, e.g. one might exclude figures, tables, appendices, front and back matter from the count.

Nevertheless sometimes a word count, or at least some estimate of a word count, is required or of some interest.

The standard Linux command wc counts the letters, words and lines in a file. However this will give a gross over estimate on many latex documents due to the large number of words which are actually latex commands and maths. To get a more accurate estimate there is a need to try to count just the actual words in the document.

Note how accurate any of these methods will be likely depends on how your document has been written, potentially the use of latex macros, what your or your examiners definition of a word count is. In particular in the later case only your examiners or academic administration team can likely formally answer how a word count may actually be assessed/applied.

Using Overleaf

Obtaining a word count in Overleaf is as simple as selecting it from the menu. It uses the texcount utility (see below for some more info on that).

Using Kile

Kile is a latex editor. If you open your document in Kile then select Statistics from the File menu you will find a word count etc.

Using Texmaker

Texmaker's integrated PDF viewer has a word count feature - just right-click in the pdf document, and select Number of words in the document.

Using untex

Use untex first to remove the tex codes and then count the words, e.g.

untex file.tex | wc -w

The accuracy of the estimate will depend to a degree on how many latex macros of your own you have which it fails to handle well.

Using TeXcount

TeXcount is another system that aims to parse the latex document and count the words, e.g. one can run

texcount.pl -inc -html -v -sum file.tex > results.html

which produces an HTML file that you can view in a web browser to see the overall counts it has done and what parts of the document it has included or excluded for the count.

Note texcount also provide a web based word count service.

As with untex the accuracy of the estimate will depend to a degree on how many latex macros of your own you have which it fails to handle well.

Using Postscript or PDF document

An alternative approach is to try to count the words in the postscript of PDF file by converting it to plain text first, e.g.

dvips -o - file.dvi | ps2ascii | wc -w

pdftotext file.pdf - | egrep -e '\w\w\w+' | iconv -f ISO-8859-15 -t UTF-8 | wc -w

Please contact us with feedback and comments about this page. Last updated on 24 Mar 2023 16:10.