Bookmark and Share

 

Hey, Scripting Guy! Question

Hey, Scripting Guy! I need to add hyperlinks to a document by using Windows PowerShell. But I do not want to add hyperlinks to objects that are set in a code style or a heading style. I only want to add the hyperlink to the word if it is in a normal Microsoft Word style. In addition, I only want to add the link one time per document. Can this be done, and if so how tough would it be to create such a script? I would be forever grateful because the script would save me an awful lot of time every week.

-- EW

 

Hey, Scripting Guy! Answer

Hello EW,

Microsoft Scripting Guy Ed Wilson here. It is another hot humid day in Charlotte, North Carolina, in the United States. Today is a pretty cool day, however, because I have a couple of meetings that are always the highlight of my week. The first meeting is my weekly meeting with my alter ego, Craig; I always come away from those meetings inspired with great ideas, projects, and cool things we can do to better interact with the community. In addition, Craig always has cool ideas about things we can do around the Script Center to make things easier to find. The second meeting is with our site manager, Ian. Ian is new to our team but he is quickly becoming the go-to person for helping to get our Script Center spruced up. The other meeting that I always enjoy is the Knowledge Engineers meeting. Knowledge Engineers are a group of people who are exploring various ways to interact with the community.

Well, I digress. I decided I needed my large teapot today because I will be on Live Meeting for a total of four and a half hours today. Earl Grey tea is my favorite for afternoon meetings, especially when accompanied by a small plate stacked with ANZAC biscuits.

EW, because I like to pay attention when I am in meetings, I want to go ahead and get your Windows PowerShell script written before my first meeting. The complete ReadTextFileAddHyperLinksToDocument.ps1 script is shown here.

ReadTextFileAddHyperLinksToDocument.ps1

[cmdletBinding()]
Param(
$wordFile = "C:\fso1\words.CSV",
$docPath = "C:\fso1\TestHSG.Doc"
) #end param
add-type -AssemblyName "Microsoft.Office.Interop.Word"
$wdunits = "Microsoft.Office.Interop.Word.wdunits" -as [type]
$application = New-Object -comobject word.application
$application.visible = $False
$words = Import-Csv $wordFile
$docs = Get-childitem -Path $docPath -Include *.doc,*.docx -Recurse
Foreach ($doc in $docs)
{
Write-Verbose -Message "Processing $($doc.fullname)"
$document = $application.documents.open($doc.FullName)
$range = $document.content
$null = $range.movestart($wdunits::wdword,$range.start)
$matchCase = $false
$matchWholeWord = $true
$matchWildCards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$wrap = 1
Foreach ($word in $words)
{
$findText = $word.Word
write-verbose -Message "looking for $($findText)"
$wordFound = $range.find.execute($findText,$matchCase,
$matchWholeWord,$matchWildCards,$matchSoundsLike,
$matchAllWordForms,$forward,$wrap)
Write-Verbose -Message "$($findText) returned $wordFound"
if($wordFound)
{
if($Range.style.namelocal -eq "normal")
{$null = $document.HyperLinks.Add($Range, $word.URL,$null,$null,$FindText)}
ELSE
{
Write-Verbose -Message `
"$($findText) not modified because it is $($Range.style.namelocal)"
}
} #end if $wordFound
$range = $document.content
$null = $range.movestart($wdunits::wdword,$range.start)
$wordFound = $false
} #end foreach $word
$document.save()
$document.close()
} #end foreach $doc
$application.quit()
Remove-Variable -Name application
[gc]::collect()
[gc]::WaitForPendingFinalizers()

Suppose you have a Microsoft Word document such as the one seen in the following image, and you want to add a large number of hyperlinks to it. Rather than opening it up and doing everything manually, using a script makes more sense.

Image of a Office Word document

The first thing that needs to be done is to create the command-line parameters. In addition, you will want to specify cmdletBinding to allow the script to take advantage of common parameters. By taking advantage of common parameters, you do not need to add additional code to your script to permit it to properly support parameters such as verbose. The param statement is used to create two command-line parameters: the first is the path to the word file and the second is the path to the document to be processed. The value of –wordfile is a CSV file that consists of two values: word and URL. An example of such a CSV file is shown in the following image.

Image of example CSV file

The –docpath parameter can point to a single Microsoft Word document or a folder that contains Microsoft Word documents. If you supply a folder, you only need to supply the folder path. The first few times you run the script, you will probably wish to run it with the –verbose common parameter. This is shown here:

PS C:\> C:\SG\ReadTextFileAddHyperLinksToDocument.ps1 -docPath c:\fso –Verbose

When the script is run with the –verbose parameter, the output appears that is shown in the following image.

Image of script output from running script with -verbose parameter

This section of the script is shown here:

[cmdletBinding()]
Param(
$wordFile = "C:\fso1\words.CSV",
$docPath = "C:\fso1\TestHSG.Doc"
) #end param

Next, the Add-Type cmdlet is used to ensure the Microsoft Word interop assembly is loaded. Then, the WdUnits enumeration is created. The Microsoft Word Application Object is created, and the visible property of the application object is set to False. This is shown here:

add-type -AssemblyName "Microsoft.Office.Interop.Word"
$wdunits = "Microsoft.Office.Interop.Word.wdunits" -as [type]
$application = New-Object -comobject word.application
$application.visible = $False

Because the Word file is stored as a comma-separated value (CSV) file, the Import-CSV cmdlet is used to import the contents into the $words variable. The Get-ChildItem cmdlet is used to retrieve an array of fileinfo objects and store the results in the $docs variable. This cmdlet will work even if a specific file is added to the –docpath parameter. This section of the script is shown here:

$words = Import-Csv $wordFile
$docs = Get-ChildItem Path $docPath -Include *.doc,*.docx -Recurse

When the script is run with the –verbose parameter, a status message is displayed that states the name of the document currently being processed. In addition, a message about an interop assembly is displayed. It says, “The object is being written to the pipeline…” even if the object is not piped. In addition, it states that the interop assembly should be installed; however, the interop assembly is loaded. Keep in mind this message is verbose, not a warning or an error. A Microsoft Word Document object is created when the Document object’s Open method is called. The document object’s open method has a number of optional parameters, including the ability to supply a password. The document object is stored in the $document variable. The output is shown here:

VERBOSE: The object written to the pipeline is an instance of the type
"Microsoft.Office.Interop.Word.ApplicationClass" from the component's primary
interop assembly. If this type exposes different members than the IDispatch members,
scripts written to work with this object might not work if the primary interop
assembly is not installed.
VERBOSE: Processing C:\fso\HSG-6-1-10.Doc

This section of the script is shown here:

Foreach ($doc in $docs)
{
Write-Verbose -Message "Processing $($doc.fullname)"
$document = $application.documents.open($doc.FullName)

The Document object’s Content property returns a Range object that represents the main document story. This Range object will be used to work with the text in the document. The Range object’s MoveStart method moves the start position of the Range object to the beginning of the range. To do this, it is necessary to specify which units to use; in this example, words are used as the unit of measure. Next, the position itself is specified. The position used here is the Range object’s Start property. The units are an instance of a WdUnit enumeration value. This section of the script is seen here:

$range = $document.content
$null = $range.movestart($wdunits::wdword,$range.start)

Now a number of variables used for the parameters for the Execute method of the Find object are initialized. These are shown here:

$matchCase = $false
$matchWholeWord = $true
$matchWildCards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$wrap = 1

Then each word in the array of words from the wordlist.csv file is processed. The value of the word property is assigned to the $findText variable. A verbose status message is displayed on the Windows PowerShell console when the script is run with the –verbose parameter. The Find object’s Execute method is used to locate the first occurrence of the word. The Execute method returns a Boolean value.

When the Find object is used from a Range object (as is the case here), the Range object is redefined to match the result of the Find operation. This is the reason for redefining the range later on in the script. If the Find object is created from a Selection object, the selection is changed when the text that matches the find criteria is found. If the Microsoft Word document is visible when using find from a Selection object, the matching text will change colors. The active selection could then be used to obtain a Range object to use when adding a hyperlink. By using a Range object to launch find, I remove one step in the process.

This section of the script is shown here.

Foreach ($word in $words)
{
$findText = $word.Word
write-verbose -Message "looking for $($findText)"
$wordFound = $range.find.execute($findText,$matchCase,
$matchWholeWord,$matchWildCards,$matchSoundsLike,
$matchAllWordForms,$forward,$wrap)

It is time to display another status message. If a word is found, the value of the $wordFound variable was set to $True by the Execute method of the Find object.

Searching a Microsoft Word document by using the Find object was covered in an April Hey, Scripting Guy! Blog post. In addition to interesting content, there is an excellent picture of a Green Moray eel that I took while scuba diving in Boca Raton, Florida.

If the style of the text that was located is normal, I want to add a hyperlink. If the text is not normal—for example, if it is a heading or a code style—I do not want to insert a hyperlink. The Hyperlinks object is used to work with hyperlinks in a Microsoft Word document.

Inserting hyperlinks into a Microsoft Word document was discussed in a May 2009 Hey, Scripting Guy! Blog post. Interestingly enough, that article has a really cool picture of a green moray eel that I took while scuba diving off Kona on the Big Island of Hawaii.

To insert a hyperlink, use the Hyperlinks object’s Add method. This section of the script is shown here:

Write-Verbose -Message "$($findText) returned $wordFound"
if($wordFound)
{
if($Range.style.namelocal -eq "normal")
{$null = $document.HyperLinks.Add($Range, $word.URL,$null,$null,$FindText)}

When text is not normal text style, a verbose message is used to state the style of the text and the fact that the hyperlink was not added. This is shown here:

ELSE
{
Write-Verbose -Message `
"$($findText) not modified because it is $($Range.style.namelocal)"
}
} #end if $wordFound

After working with the hyperlink, it is time to reset the Range object to the contents of the document. The start position of the Range object is moved to the start of the range, and the $wordFound variable is set to $false. The script will then loop around to the next word in the array. This section of the script is shown here:

$range = $document.content
$null = $range.movestart($wdunits::wdword,$range.start)
$wordFound = $false
} #end foreach $word

After all the words from the wordlist.csv file have been processed, it is time to save the document and to close. The code that performs these two operations is shown here:

$document.save()
$document.close()
} #end foreach $doc

When all of the documents have been processed, it is time to release the resources. This was talked about in Tuesday’s Hey Scripting Guy Blog post and will not be repeated here:

$application.quit()
Remove-Variable -Name application
[gc]::collect()
[gc]::WaitForPendingFinalizers()

When the script has finished processing a document, the hyperlinks that were added appear as shown in the following image.

Image of document with added hyperlinks


EW, that is all there is to using Windows PowerShell 2.0 to add hyperlinks to a Microsoft Word document. This also concludes our articles for Microsoft Office Week (well, ok, it turned out to be Microsoft Word week). Join us tomorrow when we dig into the virtual mailbag for Quick-Hits Friday—I have selected some awesome topics.

If you want to know exactly what we will be looking at tomorrow, follow us on Twitter or Facebook. If you have any questions, send e-mail to us at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson and Craig Liebendorfer, Scripting Guys