Hey, Scripting Guy! How Do I Count the Number of Words in a Group of Office Word Documents?

Hey, Scripting Guy! How Do I Count the Number of Words in a Group of Office Word Documents?

  • Comments 3
  • Likes

Bookmark and Share

Hey, Scripting Guy! Question

Hey, Scripting Guy! I have a folder that contains a number of Microsoft Word documents that are all related to a specific project. I need to know the number of words in all of those Word documents. This is because the project is billable based upon the number of words. In the past, I used Explorer to open each file to get the word count of each Microsoft Word document. I then added them up with calculator. This was a time-consuming and tedious process. But when you are dealing with money, you do not mind taking a little extra effort. The problem is this new project is a very large project with hundreds of document files. To make matters worse, the project folder also has Microsoft Excel files, Microsoft Project files, bitmap images, and assorted Notepad text files. It is a mess, and I do not want to waste several hours on this if I can help it. Can you help me please?

-- WS

Hey, Scripting Guy! Answer

Hello WS,

Microsoft Scripting Guy Ed Wilson here. You know, contrary to popular belief, Scripting Guys are not paid by the word. A long time ago when I wrote for a newspaper, I was paid by the word. I have also been paid by the word for magazine articles, so your question is not something that is completely off the wall. In addition to being paid by the word, when I write a proposal for a new book project, I must supply an estimated word count for the new book. This is used to determine estimated page count, which the book publisher can use as a guide for setting the cost of new book. When writing the book, if my chapters are tending to be either too long or too short, my developmental editor will start to get excited.

Currently, I am not too excited. I am sipping a cup of English breakfast tea with a cinnamon stick in it and a bit of lemon grass. I am listening to “Travelin' Blues” by Dave Brubeck on my Zune.

Truth in Blogging notice to comply with recent USA FTC requirements: I purchased my Zune using my own money. It was not a gift, a bribe, or a prize for blogging about the device. I also purchased the Scripting Wife's Zune as well. I bought both of these devices because they are cool!

The grass outside is damp from the evening’s rain, and patches of fog lounge on the lawn like deer near a stream in the deep woods. While listening to “Linus and Lucy,” also by Dave Brubeck, I wrote the CountWordsInWord.ps1 script seen here.

CountWordsInWord.ps1

Function Set-Variables
{
 $folderpath = "c:\fso\*"
 $fileTypes = "*.docx","*doc"
 $confirmConversion = $false
 $readOnly = $true
 $addToRecent = $false
 $passwordDocument = "password"
 $wordCountFile = "C:\fso\wordCount.csv"
 $numberOfWords = 0
 Set-OutPutFile
} #end Set-Variables

Function Set-OutputFile
{
 if(Test-Path -path $wordCountFile)
   { Remove-Item -path $wordCountFile }
 "name,wordCount" >> $wordCountFile
 Get-WordDocuments
} #end Set-OutputFile

Function Get-WordDocuments
{
  "Counting Words in Word Docs in $folderPath"
 $word = New-Object -ComObject word.application
 $word.visible = $false
 Get-ChildItem -path $folderpath -include $fileTypes |
 foreach-object `
  {
   $path =  ($_.fullname).substring(0,($_.FullName).lastindexOf("."))
   $doc = $word.documents.open($_.fullname, $confirmConversion, $readOnly,
   $addToRecent,   $passwordDocument)
   "  $($_.name), $($doc.words.count)"  >> $wordCountFile
   $doc.close()
  } #end Foreach-Object
 $word.Quit()
 Get-WordCount
} #end Get-WordDocuments

Function Get-WordCount
{
 $wdcsv = import-csv -path $wordCountFile
 for ($i = 0 ; $i -le $wdcsv.length -1 ; $i++)
 {
  $numberOfWords += [int32]$wdcsv[$i].wordCount
 }
 $numberOfWords
} #end Get-WordCount

# *** Entry Point to Script ***

Set-Variables

The CountWordsInWord.ps1 script contains a series of functions, each of which calls another function until the word count report is produced. This is not done to promote code reuse (which is one reason for writing functions), but to make the script easier to read and to understand (another reason for writing functions). You will notice the script is a bit complicated. If a similar script were written using VBScript, it would not be significantly different. The main advantage of using Windows PowerShell for this script is that the Get-ChildItem cmdlet is easier to use than the FileSystemObject. We have a few other advantages due to the more compact syntax of Windows PowerShell over VBScript. The complexity of the script is due to using the Microsoft Word automation objects to access the word count of the Word documents.

The first function in the CountWordsInWord.ps1 script is called Set-Variables. As the name implies, it sets the values of variables that will be used in the script. A variable table is a good way to keep track of a large number of variables in a script. The variable table for the CountWordsInWord.ps1 script is seen in Table 1.

Table 1  Variable Table

Variable

Initial value

Use in script

$folderpath

 "c:\fso\*"

Path to search for Word documents.

$fileTypes

 "*.docx","*doc"

Two types of Word documents to look for.

$confirmConversion

 $false

Used by Open method of Word. Do not prompt if conversion is needed.

$readOnly

 $true

Used by Open method of Word. Open document read-only.

$addToRecent

 $false

Used by Open method of Word. Do not add to recently used.

$passwordDocument

 "password"

Used by Open method of Word. The password to use for any password protected documents.

$wordCountFile

 "C:\fso\wordCount.csv"

The path for the output file that holds file names and word count.

$numberOfWords

 0

Used to hold the total number of words for all Word documents.

After all of the variables have been initialized, the Set-Variable function calls the Set-OutPutFile function. By calling the Set-OutPutFile function from within the Set-Variables function, all of the variables will be made available to the Set-OutPutFile function. This is because the Set-Variables function will become the parent namespace for the Set-OutPutFile function. All variables that are created in the Set-Variables namespace will be available within the Set-Variables namespace, as well as child namespaces. If, of course, the variables were marked as private, they would only be available within the Set-Variables function. WS, you will need to modify the value of the $folderpath variable and the $wordCountFile variable to match your computer. The Set-Variables function is seen here:

Function Set-Variables
{
 $folderpath = "c:\fso\*"
 $fileTypes = "*.docx","*doc"
 $confirmConversion = $false
 $readOnly = $true
 $addToRecent = $false
 $passwordDocument = "password"
 $wordCountFile = "C:\fso\wordCount.csv"
 $numberOfWords = 0
 Set-OutputFile
} #end Set-Variables

In the Set-OutputFile function, the path to the $wordCountFile file is checked. If the file exists, it is deleted. A new file is then created by using the redirection arrows. The header for the wordCount.csv file is name and wordCount. The header for the file is written at the same time that the file is created. After creating the wordCount.csv file, the Set-OutputFile function calls the Get-WordDocuments function. The Set-OutputFile function is shown here:

Function Set-OutputFile
{
 if(Test-Path -path $wordCountFile)
   { Remove-Item -path $wordCountFile }
 "name,wordCount" >> $wordCountFile
 Get-WordDocuments
} #end Set-OutputFile

The Get-WordDocuments function is the function that interacts with the Word automation object model. The first thing the Get-WordDocuments function does after displaying a status message on the Windows PowerShell console is create an instance of the word.application object. This object is the main object you use when working with Microsoft Word. The word.application object has a number of methods and properties available to it, which are all documented on MSDN. The visible property of the application object is set to $false because there is no need to pop up a hundred or more Microsoft Word documents when all you want to do is to obtain the word count from the document. This part of the Get-WordDocuments function is seen here:

 "Counting Words in Word Docs in $folderPath"
 $word = New-Object -ComObject word.application
 $word.visible = $false

You can use the Get-ChildItem cmdlet from Windows PowerShell to obtain a listing of all of the Microsoft Word documents in the specified folder. Each of the found files is piped to the ForEach-Object cmdlet where the path to file is obtained. This is seen here:

Get-ChildItem -path $folderpath -include $fileTypes |
 foreach-object `
  {
   $path =  ($_.fullname).substring(0,($_.FullName).lastindexOf("."))

After the path to the Microsoft Word document has been retrieved, you can use it with the open method from the documents collection. The documents collection object contains the open method and is retrieved by querying the documents property from the word.application object. After the document is open, the words collection is used to retrieve the count property. The file name and the word count are written to the word count file. This section of the Get-WordDocuments function is shown here:

   $doc = $word.documents.open($_.fullname, $confirmConversion, $readOnly,
   $addToRecent,   $passwordDocument)
   "  $($_.name), $($doc.words.count)"  >> $wordCountFile
   $doc.close()

The last thing the Get-WordDocuments function does is call the Get-WordCount function. The complete Get-WordDocuments function is seen here:

Function Get-WordDocuments
{
  "Counting Words in Word Docs in $folderPath"
 $word = New-Object -ComObject word.application
 $word.visible = $false
 Get-ChildItem -path $folderpath -include $fileTypes |
 foreach-object `
  {
   $path =  ($_.fullname).substring(0,($_.FullName).lastindexOf("."))
   $doc = $word.documents.open($_.fullname, $confirmConversion, $readOnly,
   $addToRecent,   $passwordDocument)
   "  $($_.name), $($doc.words.count)"  >> $wordCountFile
   $doc.close()
  } #end Foreach-Object
 $word.Quit()
 Get-WordCount
} #end Get-WordDocuments

The Get-WordCount function is used to read the contents of the $wordCountFile. The wordcount.csv file from my computer is seen here:

Image of the wordcount.csv file

Import-Csv cmdlet returns a custom object that has column heads that are derived from the .csv file. This is seen here:

PS C:\> $a = Import-Csv C:\fso\wordCount.csv

PS C:\> $a

 

name                                    wordCount

----                                    ---------

Bios.docx                               38

Chapter3.doc                            11692

hyper.doc                               24

ServiceTest.doc                         1017

SharkPic.docx                           18

test.doc                                217

test2.doc                               138

TestDocumentProtected.docx              11

__Chapter_Template.doc                  242

__Template.doc                          2

 

 

PS C:\> $a | gm

 

 

   TypeName: System.Management.Automation.PSCustomObject

 

Name        MemberType   Definition

----        ----------   ----------

Equals      Method       System.Boolean Equals(Object obj)

GetHashCode Method       System.Int32 GetHashCode()

GetType     Method       System.Type GetType()

ToString    Method       System.String ToString()

name        NoteProperty System.String name=Bios.docx

wordCount   NoteProperty System.String wordCount=38

 

 

PS C:\>

You can index into the object and retrieve the specific word count of each Word document. The Get-WordCount function is shown here:

Function Get-WordCount
{
 $wdcsv = import-csv -path $wordCountFile
 for ($i = 0 ; $i -le $wdcsv.length -1 ; $i++)
 {
  $numberOfWords += [int32]$wdcsv[$i].wordCount
 }
 $numberOfWords
} #end Get-WordCount

The last thing to do is to call the Set-Variables function and set the entire script into motion like tipping over a domino. This is shown here:

# *** Entry Point to Script ***

 

Set-Variables

Well, WS, that is all there is to retrieving the word count from all the Microsoft Word documents in a particular folder. Join us tomorrow as Office Word Week continues.

If you want to know exactly what we will be looking at tomorrow, follow us on Twitter or Facebook. If you have any questions, send e-mail to us at scripter@microsoft.com or post them on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson and Craig Liebendorfer, Scripting Guys

 

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • I have a huge problem...I have to count words in my novel ASAP.  If it was in Word Perfect I would go to "Properties" and have it there.  I don't know where to go with Microsoft Word.  I can't convert to WP because I have to send it in Word format.  What do I do to get word count in Microsaoft Word??

  • @BabeBennet It depends on what version of Microsoft Word you are using. I use Microsoft Word 2010, and the word count appears in the lower left corner of the page, next to the page number. For other versions,I cannot remember, and I do not have access to them. You can always get good help on using Microsoft Office products by going to the Microsoft Office web site: office.microsoft.com/en-us is the link I use.

  • I believe on all versions of Word under 'Revioe' select 'Word Count'.

    This is also exposed on the document object.