Hey, Scripting Guy! How Can I Search a Microsoft Excel Workbook for Links to Other Workbooks?

Hey, Scripting Guy! How Can I Search a Microsoft Excel Workbook for Links to Other Workbooks?

  • Comments 2
  • Likes

Bookmark and Share 

 

Hey, Scripting Guy! Question

Hey, Scripting Guy! I have a user in our finance department that asked me today about using a Windows PowerShell script to identify Microsoft Excel workbooks that have links inside them to other Microsoft Excel workbooks. Frankly, I am not exactly sure what she is talking about, but I do think it is great that she asked me about Windows PowerShell. Can you help me out? I do not want this user to get all depressed the first time she attempts to do something with Windows PowerShell.

-- SV

 

Hey, Scripting Guy! AnswerHello SV,

Microsoft Scripting Guy Ed Wilson here. It has been many years since I created Microsoft Excel workbooks that contained links to other workbooks, or more accurately, links to spreadsheets in other workbooks. I believe I know the reason your user is asking for such a script, and that is because according to the Microsoft Excel online Help, there is no automated method of detecting such links. But if one of the workbooks becomes moved, renamed, or deleted, the workbook containing the link will not work properly. This is shown in the following image.

Image of broken workbook

 

The link to a spreadsheet in another Microsoft Excel workbook looks something like this: ='C:\fso\[cdRoms.xlsx]Sheet1'!$A$1. Therefore, it will be possible to write a Windows PowerShell script that will search a workbook for formulas that match that pattern. The Get-LinkedExcelSheets.ps1 script is an example of a script that does this. The complete Get-LinkedExcelSheets.ps1 script is shown here.

Get-LinkedExcelSheets.ps1

$path = "C:\fso"
$excelSheets = Get-Childitem -Path $path -Include *.xls,*.xlsx -Recurse
$excel = New-Object -comobject Excel.Application
$excel.visible = $false

foreach($excelSheet in $excelSheets)
{
 $workbook = $excel.Workbooks.Open($excelSheet)
 "There are $($workbook.Sheets.count) sheets in $excelSheet"

 For($i = 1 ; $i -le $workbook.Sheets.count ; $i++)
 {
  $worksheet = $workbook.sheets.item($i)
  "`tLooking for links on $($worksheet.name) worksheet"
  $rowMax = ($worksheet.usedRange.rows).count
  $columnMax = ($worksheet.usedRange.columns).count
  For($row = 1 ; $row -le $rowMax ; $row ++)
  {
   For($column = 1 ; $column -le $columnMax ; $column ++)
    {
     [string]$formula = $workSheet.cells.item($row,$column).formula
     if($formula -match "\w?:\\\w*\\\[\w*\.xls\w?\]") {"`t`t$($formula)"}
    } #end for $column
   } #end for $row
  $worksheet = $rowmax = $columnMax = $row = $column = $formula = $null
 } #end for
 $workbook.saved = $true
 $workbook.close()
} #end foreach
 
 $excel.quit()
 $excel = $null
 [gc]::collect()
 [gc]::WaitForPendingFinalizers()

For more information about using Windows PowerShell with the Microsoft Excel automation model, refer to the Microsoft Excel articles on the Hey, Scripting Guy! Blog.

To illustrate working with linked Microsoft Excel spreadsheets, I created a couple of Microsoft Excel workbooks containing spreadsheets with links to other spreadsheets in other workbooks. This is shown in the following image.

Image of Microsoft workbook with links

 

The first thing to do in the Get-LinkedExcelSheets.ps1 script is to assign the path to the folder that contains Microsoft Excel workbooks. The path is assigned to the $path variable. Next, use the Get-ChildItem cmdlet to retrieve all of the Microsoft Excel spreadsheets. The include parameter includes all of the legacy XLS workbooks as well as the new XLSX workbooks. The recurse parameter is required to make the Get-ChildItem cmdlet return all files that match the search criteria in the folder. This section of the script is shown here:

$path = "C:\fso"
$excelSheets = Get-Childitem -Path $path -Include *.xls,*.xlsx -Recurse

Now it is time to create an instance of the Excel.Application COM object. To do this, use the New-Object cmdlet and specify the name of the COM object. The application object is the main object that is used when working with Microsoft Excel automation. Because there are possibly many Microsoft Excel workbooks in the folder, it is not desirable to have the Microsoft Excel application appearing and disappearing during the running of the script. Therefore, the visible property from the Microsoft Excel application object is set to $false. This is shown here:

$excel = New-Object -comobject Excel.Application
$excel.visible = $false

When the Excel.Application object has been created, and the collection of Microsoft Excel workbooks has been generated, it is time to walk through the collection. To do this, use the ForEach statement as shown here:

foreach($excelSheet in $excelSheets)
{

Inside the ForEach statement, use the Open method from the Workbooks collection of the Excel.Application object. Pass the path that is stored in the $excelSheet variable to the Open method, and store the returned workbook object in the $workbook variable as shown here:

 $workbook = $excel.Workbooks.Open($excelSheet)

The sheets object, which is obtained from the workbook object, contains the count property that can be used to determine how many spreadsheets are contained in a Microsoft Excel workbook. This information is displayed in the Windows PowerShell console. A subexpression $() is required to prevent the COM object from unraveling inside the double quotation marks. The subexpression forces the value of the expression to be processed before the string is displayed. This is shown here:

 "There are $($workbook.Sheets.count) sheets in $excelSheet"

Now it is time to walk through the collection of spreadsheets. To do this, the For statement is used as shown here:

 For($i = 1 ; $i -le $workbook.Sheets.count ; $i++)

The item method from the sheets collection is used to retrieve a specific spreadsheet. Because the For statement will continue to operate once for each of the total number of spreadsheets in the collection, the $i variable is used to keep track of where the operation is within the collection. The worksheet object is stored in the $worksheet variable as shown here:

 {
  $worksheet = $workbook.sheets.item($i)

Next, a status message is displayed on the Windows PowerShell console. The `t character is a special character that causes the text to tab over one stop. A subexpression is used to retrieve the name of the worksheet via the name property. This is shown here:

  "`tLooking for links on $($worksheet.name) worksheet"

Now the number of rows and columns that are used within the spreadsheet are calculated via the count property of the range object. The number of rows and the number of columns are assigned to variables as shown here:

  $rowMax = ($worksheet.usedRange.rows).count
  $columnMax = ($worksheet.usedRange.columns).count

For each used row and each used column, the value of the formula property of the cell is retrieved and stored in the $formula variable. The [string] type is used to ensure the formula is returned as a string so that a regular expression pattern match can be used to look for linked spreadsheets. This is shown here:

  For($row = 1 ; $row -le $rowMax ; $row ++)
  {
   For($column = 1 ; $column -le $columnMax ; $column ++)
    {
     [string]$formula = $workSheet.cells.item($row,$column).formula

The If statement is used to see if the formula stored in the $formula variable matches a particular regular expression pattern. The regular expression pattern is not particularly complicated, but it is a bit cumbersome. The regular expression pattern is shown here:

\w?:\\\w*\\\[\w*\.xls\w?\]

A useful breakdown of the pattern is shown in the following table.

Regular expression

Meaning

\w

a word character [a-zA-Z_0-9]

?

one instance of preceding character

:

literal character

\

escapes a backslash

\

the backslash character

\w

a word character [a-zA-Z_0-9]

*

any instances of previous character

\

escape the \ character

\

literal character

\

escape the [ character

[

literal character

\w

a word character [a-zA-Z_0-9]

*

any instances of previous character

\

escape the . character

.

literal character

xls

literal characters

\w

a word character [a-zA-Z_0-9]

?

one instance of preceding character

\

escape the ] character

]

 literal character

If a match with the regular expression pattern is found, the formula is tabbed over two stops and displayed. The code to do this is shown here:

     if($formula -match "\w?:\\\w*\\\[\w*\.xls\w?\]") {"`t`t$($formula)"}

Now it is time to close out the loops, and re-initialize the variables:

    } #end for $column
   } #end for $row
  $worksheet = $rowmax = $columnMax = $row = $column = $formula = $null
 } #end for

To prevent the Save Workbook prompt from displaying, the saved property of the workbook object is set to $true. This is a safe operation because we know that we did not make any changes to the workbook. The workbook is then closed by using the close method from the workbook object. This is shown here:

 $workbook.saved = $true
 $workbook.close()
} #end foreach

It is time to release the memory used by the excel.application object. To do this, call the quit method from the application object. When the script is running from within the Windows PowerShell ISE, the memory is not released immediately. To do this, set the object to null, and call garbage collection. This is shown here:

 $excel.quit()
 $excel = $null
 [gc]::collect()
 [gc]::WaitForPendingFinalizers()

When the script runs, the following is shown.

Image of script output

 


SV, that is all there is to using Windows PowerShell to search a Microsoft Excel workbook and identify links to spreadsheets in other workbooks. This week, we are looking at a number of topics from a variety of technologies. And we will continue tomorrow when we will talk about…wait a minute.

If you want to know exactly what we will be looking at tomorrow, follow us on Twitter or Facebook. If you have any questions, send e-mail to us at scripter@microsoft.com or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

 

Ed Wilson and Craig Liebendorfer, Scripting Guys 

 

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • FYI: With Excel 2007 you can get a list of the link sources with the Workbook.LinkSources Method.

  • I came across this script and it was exactly what I was looking for as one of your departments use links and we are migrating our file server and we need to see what links we would break. One thing I am looking for is how do I export the results to a CSV file using Export-CSV. I have been racking my brain and I cannot find how to do this.