Weekend Scripter: A Function to Clean Up Data Import

Weekend Scripter: A Function to Clean Up Data Import

  • Comments 2
  • Likes

Summary: Learn about a Windows PowerShell function you can use to clean up data prior to import.

Microsoft Scripting Guy, Ed Wilson, is here. We are entering the Labor Day weekend in the United States today. Labor Day is a sad holiday, but it is also a happy time. It is sad because it traditionally marks the end of summer. It is happy because if you live in Charlotte, North Carolina, it means that the time of high temperatures and high humidity are nearing an end.

Autumn (September through November) is definitely the best time of the year to visit Charlotte. The temperatures during the day are warm enough to permit short sleeve shirts, and in some cases short pants. In the evening, the temperatures are low enough that a nice jacket is all you need to keep warm. The leaves on the trees change colors, the sky clears, the humidity drops, and there is hardly any rain. Yes, it is my favorite time of the year around here.

Why a function?

During Data Manipulation Week, I have created a series of scripts to parse various aspects of data that came from a data dump. I created scripts to fix the date, the names, and the addresses that were presented inconsistently in a spreadsheet. This is the process I would follow when I needed to normalize data that I got from one database before I import it to another database. Yesterday, I created a single object from these portions of script. Now to provide a high level of versatility, I want to transform yesterday’s script into a function.

This process is actually pretty easy because I designed my separate scripts with integration in mind. I need to add the Function keyword, assign a name to the function, add a parameter, import the CSV file, and then close the function. That is about all I need to do. Here is the command I use to begin my function:

Function Convert-Data

{

Now I need to define a parameter. The parameter I need is a path to the CSV file that holds the data I want to import:

Param ([string]$path)

I need to import the CSV file and store it in a variable. I use the same $datain variable I have used all week. The path to the CSV file that will be imported is via the $path variable that I have set as a parameter to the function. This is shown here:

$datain = Import-CSV -Path $path 

I close the function:

} #end function Convert-Data

And the rest of the script is the same. In fact, because I created regions in my script yesterday, it is pretty easy to look at:

Foreach ($d in $datain)

    {

#region Name

 If($d.name -match  ',')

        {

          $name = (Get-Culture).textinfo.ToTitleCase($d.name).Split(',').trim()

          $ln = $name[0]

          $fn = $name[1]

          }

        ELSE {

         $name = $d.Name.Split().trim()

         $fn = $name[0]

         $ln = $name[1]

           }

#endregion

 

 #region Address

     $a = $d.address.split(',')

     $str = $a[0] -replace '(Street|street)','ST'

     $city = If($a[1] -match 'dwpt') {"Dewpoint"} ELSE {$a[1]}

     $state = if($a[2] -match 'South Carolina') {"SC"} ELSE {$a[2]}

     $zip = If($a[3].trim().Length -gt 5) {"Error for $a"} ELSE {$a[3]}

#endregion

 

#region Create Custom Object

     [PSCustomObject]@{

       Lname = $ln

       Fname = $fn

       Date = [datetime]("{0}/{1}/{2}" -f $d.month, $d.day, $d.year)

       Street = $str.ToUpper()

       City = $city.ToUpper()

       State = $state.Trim().Substring(0,2).ToUpper()

       Zip = $zip }

#endregion

    } 

The complete script is presented here:

Function Convert-Data

{

 Param ([string]$path)

 $datain = Import-CSV -Path $path

 Foreach ($d in $datain)

    {

#region Name

 If($d.name -match  ',')

        {

          $name = (Get-Culture).textinfo.ToTitleCase($d.name).Split(',').trim()

          $ln = $name[0]

          $fn = $name[1]

          }

        ELSE {

         $name = $d.Name.Split().trim()

         $fn = $name[0]

         $ln = $name[1]

           }

#endregion

 

 #region Address

     $a = $d.address.split(',')

     $str = $a[0] -replace '(Street|street)','ST'

     $city = If($a[1] -match 'dwpt') {"Dewpoint"} ELSE {$a[1]}

     $state = if($a[2] -match 'South Carolina') {"SC"} ELSE {$a[2]}

     $zip = If($a[3].trim().Length -gt 5) {"Error for $a"} ELSE {$a[3]}

#endregion

 

#region Create Custom Object

     [PSCustomObject]@{

       Lname = $ln

       Fname = $fn

       Date = [datetime]("{0}/{1}/{2}" -f $d.month, $d.day, $d.year)

       Street = $str.ToUpper()

       City = $city.ToUpper()

       State = $state.Trim().Substring(0,2).ToUpper()

       Zip = $zip }

#endregion

    }

} #end function Convert-Data

When I run the script in the ISE, it loads the function into memory. I can then call it directly from the console pane, as shown in the following image:

Image of command output

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy 

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment