Use a PowerShell Module to Easily Export Excel Data to CSV

Use a PowerShell Module to Easily Export Excel Data to CSV

  • Comments 9
  • Likes

Summary: Learn how to use a Windows PowerShell module to easily import and export Excel data.

 

Microsoft Scripting Guy Ed Wilson here. Jeremy Engel has joined us today to talk about using a Windows PowerShell module to easily import and export Excel data.

Take it away, Jeremy!

 

I started coding at the age of 10, writing games in BASIC for my lightning-fast Commodore 64. From there I ventured into the high-tech world of text-based online gaming, affectionately known as MUDs—specifically EverDark, where I learned to write C++ code. Later, I made the jump to systems engineering, and it was there that I discovered scripting. I loathe repetitive tasks and have a great love for efficiency (I’m lazy). Therefore, I can find ways to incorporate Windows PowerShell into almost every project and task I do as a systems engineer. I am also known to write the occasional application or service in Visual C#, which is mutually reinforcing with Windows PowerShell because they are essentially different expressions of the same language. Now for today’s blog post. Hope you like it.


Excel to Windows PowerShell and Back Again
 

Ever wish you could import from and export to Excel just like you can using CSV files? So did I, and so did a lot of other people, as I discovered. 

Back during Tech∙Ed this year, it seemed like I spent an entire day griping about this blatant disparity between reality and my imagination. Not that I kept bringing it up (I swear I did not!), but that it kept coming up in conversation. You know, like one of those weird days where the same esoteric word finds its way into a multitude of conversations, only to then never be heard of again until the next cosmic kink in the space-time continuum. 

Anyway, during dinner that night, it came up again. I casually blurt out that it is ridiculous that there are not any such functions, and how I could code them in a single night! Then it struck me—I actually think I could code them in a single night. Before I realized it, I was announcing my plans to the group of fellow Windows PowerShell scripters. So realizing I was losing hours, I dashed home by 8:30 P.M. and set to work. By 1:00 A.M., I had two fairly well fleshed-out cmdlets, but more importantly, they worked! 

The complete Excel module can be downloaded from the Scripting Guys Script Repository. 

As a basis for understanding how the conversion between Excel and Windows PowerShell works, think of each row in an Excel spreadsheet as an object in Windows PowerShell with the names of each column in Excel being a property name of the object in Windows PowerShell. 

Image of how Excel spreadsheets correlate to objects in Windows PowerShell

So if the value for A1 is Example1 and the value for A2 is test, in PowerShell: 

$row.Example1 = “test” 

If the value for A3 is ubu, in Windows PowerShell: 

$row.Example1 = “ubu” 

Each row object is then appended to an array of these objects, eventually representing the entire spreadsheet in Windows PowerShell object form.

Image of spreadsheet in Windows PowerShell object form

Got it? When you understand this, it becomes easy to see how we could move from Excel to Windows PowerShell and back again. All that is left then is to work out the mechanics. 

I really wanted this to be Excel-independent, meaning that I wanted to read an Excel document as an XML file and vice versa. I was able to get the export cmdlet to work fine writing to XML, but I was not able to figure out how to get the Excel XML to do an import—it was all just a jumbled mess. So unfortunately, I had to stick with the “wonderful” COM object. 

Honestly, COM objects are not that bad. They just look really bad when you’re expecting them to be as flexible, transparent, fast, and disposable as Windows PowerShell objects. That aside, they do have a fair amount of useful methods and properties, which I make use of in these two cmdlets. I will not elucidate them all here, but just show you some of the important ones. 

$excel = New-Object -ComObject Excel.Application

$workbook = $excel.Workbooks.Open($Path)

$sheet = $workbook.Worksheets.Item(1) 

As you can see, I am creating an instance of the Excel.Application object and then opening the workbook located at $Path and from there retrieving the first worksheet. What do you mean you have Excel workbooks with more than one worksheet? Okay, I’ll work on an enhancement later. When, you ask? Later. 

Here is another quirk about COM objects. If you take those first two lines of Windows PowerShell code shown above and open up Excel workbook, you’ll notice that typing $workbook.Worksheets in Windows PowerShell looks like it shows you an array of worksheet objects. Alas, no, it is actually another COM object. So I couldn’t just pull the object for sheet 1 by doing $workbook.Worksheets[0]; I had to use a COM method called Item(). Fun, huh? 

One last bit on COM objects. Apparently, COM objects are like that quasihomeless guy who sleeps on your couch. They are extremely hard to get rid of. I alluded to the problem in the preceding paragraph, which is that COM objects like to return more COM objects, rather than property values. So you think you may be querying a value, when what you’re getting is a sneaky cockroach of a COM object. 

Because of that, in each cmdlet I had to go through and try to repeatedly wipe out each object of the COM before finally killing the mothership. And wow, in doing my previous example on the worksheets, I just realized that even though I wasn’t saving the COM object references for workbooks and worksheets to Windows PowerShell variables, they were still lying out there. That is why I was sometimes having trouble killing the Excel process. So I had to adjust my script slightly to release those as well! 

$workbooks = $excel.Workbooks

$workbook = $workbooks.Open($Path)

$worksheets = $workbook.Worksheets

$sheet = $worksheets.Item(1)

 

do { $o = [System.Runtime.InteropServices.Marshal]::ReleaseComObject($columns) } while($o -gt -1)

do { $o = [System.Runtime.InteropServices.Marshal]::ReleaseComObject($rows) } while($o -gt -1)

do { $o = [System.Runtime.InteropServices.Marshal]::ReleaseComObject($range) } while($o -gt -1)

do { $o = [System.Runtime.InteropServices.Marshal]::ReleaseComObject($sheet) } while($o -gt -1)

do { $o = [System.Runtime.InteropServices.Marshal]::ReleaseComObject($worksheets) } while($o -gt -1)

$workbook.Close($false)

do { $o = [System.Runtime.InteropServices.Marshal]::ReleaseComObject($workbook) } while($o -gt -1)

do { $o = [System.Runtime.InteropServices.Marshal]::ReleaseComObject($workbooks) } while($o -gt -1)

$excel.Quit()

do { $o = [System.Runtime.InteropServices.Marshal]::ReleaseComObject($excel) } while($o -gt -1)

 

It isn’t pretty, but it gets the job done. 

I tried to make the Import-Excel cmdlet as flexible as possible, knowing how people often have messy spreadsheets. So I needed to account for situations when all of the columns were missing headers (the NoHeaders switch), or if only some of the columns had no headers. Under the latter situation, could also fall stray cells out in the middle of nowhere. I didn’t want to make value-judgments on any of that data; I just wanted to make sure I could logically ingest it into Windows PowerShell. To do that, the Import-Excel script creates generic property names for missing headers. 

In the previous screen shots, did you notice how the Excel document had no header for column B? The cmdlet noticed that too and gave it a generic property name, in this case: Column2

After you have imported the spreadsheet, you can manipulate the objects just like any other in Windows PowerShell. In the following figure, I fill in item 3’s Example1 value. 

Image of manipulating the object

In the following figure, I add an entirely new item to the object array. 

Image of adding new item to object array

Converse to Import-Excel is Export-Excel, which takes an array of objects and writes them to an Excel file. For a little flare, I added in the ability to make the header row bold, and also gave you three choices of bottom border, if so desired. 

Image of writing objects to an Excel file

And that, my friends, is how a self-imposed dare is won!

 

Thank you, Jeremy, this is a great post, and I look forward to using your new Excel module.

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy

 

 

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Jeremy,

    Take a look at <a href="wp.me/p15IqD-2Y">Remove-ComObject function</a>; it cleans up ComObject leftovers very well.

  • Jeremy,

    Take a look at <a href="http://wp.me/p15IqD-2Y" target="_blank">Remove-ComObject function</a>; it cleans up ComObject leftovers very well.

    PS: Ignore my previous comment :/

  • Please change the title to this article.  This is not about exporting excel to csv file but instaad in article about how to work with an excel file.

  • Ed, Jeremy

    jrv came up with a very clean way of closing out of excel without using a loop. I'm hoping we'll be able to figure out a way to do it with a single call instead of one per object, but check out this tread over in the forums

    social.technet.microsoft.com/.../81dcbbd7-f6cc-47ec-8537-db23e5ae5e2f

  • so i just started using your cmdlet btw awesome job. however i'm running into trouble. i have a single sheet work in xlsx format. 330 rows 11 columns of data. looks like the object will only hold 11 rows of data then hangs. excel looks like it still working in the background. when opening the workbook it will release it then listing the object shows only the 11 rows no matter how many times i try. i even created a test .xlsx file with simple header names and 330 11 columns of just single characters in each cell. same thing. do you know has anyone else ran into this type of issue with your cmdlet?

  • This is awsome. Any way to speed it up though... Trying to import excel with 300 rows.. Taking over 30 minutes...still shows "running script", please wait..

  • thanks

  • When you have an Excel spreadsheet with many sheets and some of thos sheets with connections to a DB, you will find your spreadsheet quite large and converting it to CSv will quickly run out of memory or at least take very long. Here is my attempt at getting only what you want.

    Function excel2csv([string]$ExcelFile,[string]$csvFile,[string]$SheetName,[int]$EndRow,[string]$EndCol)
    {
    #################################
    # convert Excel to CSV
    #################################
    #write-host -foregroundcolor "yellow" "Excel file " $ExcelFile
    #write-host -foregroundcolor "yellow" "CSV file " $csvFile
    #write-host -foregroundcolor "yellow" "Sheet " $SheetName
    #write-host -foregroundcolor "yellow" "Row " $EndRow
    #write-host -foregroundcolor "yellow" "Column " $EndCol
    if (!(Test-Path $ExcelFile))
    {write-host -foregroundcolor "red" "Excel file does not exist"; exit 1}

    $excelObject = New-Object -ComObject Excel.Application
    $excelObject.Visible = $false
    $excelObject.DisplayAlerts = $false
    $wb= $excelObject.Workbooks.Open($ExcelFile)
    $list = $wb.sheets | where {$_.name -eq "$SheetName"}
    $csv = $wb.sheets.add()
    for ($i=2; $i -lt $EndRow+1 ; $i++) {
    If ($list.Cells.Item($i,1).text -eq "") {
    $from = $list.Range("a1:$EndCol$($i-1)")
    $copy = $from.copy()
    $to = $csv.range("a1")
    $csv.paste($to)
    $i=$EndRow+1
    }
    }
    $xlCSVType = 6
    $csv.SaveAs("$csvFile",$xlCSVType)
    $wb.close()
    $excelObject.Quit()
    [System.Runtime.Interopservices.Marshal]::ReleaseComObject($wb) > $Null
    [System.Runtime.Interopservices.Marshal]::ReleaseComObject($excelObject) > $Null
    }

  • Some easier ways to code thisL

    function Export-ExcelCSV{
    [CmdletBinding()]
    Param(
    $ExcelWorkbook="$pwd\Book1.xlsx",
    $WorksheetName='Sheet1'
    )

    Try{
    Add-Type -AssemblyName Microsoft.Office.Interop.Excel
    $xl=New-Object -ComObject Excel.Application
    [void]$xl.Workbooks.Open($ExcelWorkbook)
    $xl.Worksheets.Item($WorksheetName).Select()
    $fi=[io.fileinfo]$xl.ActiveWorkbook.FullName
    $csvfile=$fi.FullName.Replace($fi.Extension,'.csv')
    Write-Verbose "Saving CSV to $csvfile"
    $xl.ActiveSheet.SaveAs($csvFile,[Microsoft.Office.Interop.Excel.XlFileFormat]::xlCSV)
    $xl.Quit()
    }
    Catch{
    Throw $_
    }
    While([System.Runtime.Interopservices.Marshal]::ReleaseComObject($xl)){}
    }

    Export-ExcelCSV -Verbose