Learn about Windows PowerShell
Summary: Microsoft Scripting Guy, Ed Wilson, talks about using Windows PowerShell to trim strings and clean up data.
Microsoft Scripting Guy, Ed Wilson, is here. The Scripting Wife heads out today to spend time with her other passion at the Blue Ridge Classic Horse Show. Unfortunately, I will not get to attend much of that event due to a week of training that my group is doing. But I will have the chance to see at least one day of the events.
Speaking of the Scripting Wife’s passions, registration is open now for the Windows PowerShell Summit in Europe. The event will be held Sept 29 – Oct 1, 2014 in Amsterdam in The Netherlands. This will be an awesome event, and it will be a great chance to meet other PowerShellers. Teresa and I were in Amsterdam earlier this year. The city is beautiful, and the people were really friendly.
Note This post is the last in a series about strings. You might also enjoy reading:
One of the most fundamental rules for working with data is “garbage in, garbage out.” This means it is important to groom data before persisting it. It does not matter if the persisted storage is Active Directory, SQL Server, or a simple CSV file. One of the problems with “raw data” is that it may include stuff like leading spaces or trailing spaces that can affect sort and search routines. Luckily, by using Windows PowerShell and a few String methods, I can easily correct this situation.
The System.String .NET Framework class (which is documented on MSDN) has four Trim methods that enable me to easily cleanup strings by removing unwanted characters and white space. The following table lists these methods.
Removes all leading and trailing white-space characters from the current String object.
Removes all leading and trailing occurrences of a set of characters specified in an array from the current String object.
Removes all trailing occurrences of a set of characters specified in an array from the current String object.
Removes all leading occurrences of a set of characters specified in an array from the current String object.
The easiest Trim method to use is the Trim() method. It is very useful, and it is the method I use most. It easily removes all white space characters from the beginning and from the end of a string. This is shown here:
PS C:\> $string = " a String "
PS C:\> $string.Trim()
The method is that easy to use. I just call Trim() on any string, and it will clean it up. Unfortunately, the previous output is a bit hard to understand, so let me try a different approach. This time, I obtain the length of the string before I trim it, and I save the resulting string following the trim operation back into a variable. I then obtain the length of the string a second time. Here is the command:
$string = " a String "
$string = $string.Trim()
The command and the associated output from the command are shown in the following image:
If there are specific characters I need to remove from both ends of a string, I can use the Trim(char) method. This permits me to specify an array of characters to remove from both ends of the string. Here is an example in which I have a string that begins with “a “ and ends with “ a”. I use an array consisting of “a”, “ “ and it removes both ends of the string. Here is the command:
$string = "a String a"
$string1 = $string.Trim("a"," ")
The command and its associated output are shown in the image that follows:
The cool thing is that I can also specify Unicode in my array of characters. This technique is shown here:
$string1 = $string.Trim([char]0x0061, [char]0x0020)
Dr. Scripto says:
“Convenient Unicode tables are available on Wikipedia.”
The following image illustrates this technique by displaying the command and the associated output from that command:
There are times when I know specifically that I need to trim characters from the end of a string. However, those characters might also be present at the beginning of the string, and I need to keep those characters. For these types of situations, I use the TrimEnd method. The cool thing is that I can automatically use this method three ways:
In this example, I create a string and then specify an array of specific Unicode characters by using the Unicode code value. Because the string begins with the same characters that it ends with, this is a good test to show how I can delete specific characters from the end of the string. Here is the string:
I now specify two Unicode characters by code value to delete from the end of the string. I store the returned string in a variable as shown here:
$string1 = $string.TrimEnd([char]0x0061, [char]0x0020)
The command and the associated output are shown in the following image:
I can also specify the specific character to trim by typing the characters. In this example, I type the< space> a characters as a single array element, so it will only delete <space> a from the end of the string:
PS C:\> $string = "a String a"
PS C:\> $string.Length
PS C:\> $string1 = $string.TrimEnd(" a")
PS C:\> $string1
PS C:\> $string1.Length
In the following example, I specify the characters as individual elements in an array. In fact, I do not even have them in the same order as they appear in the string. Yet, the results are the same.
PS C:\> $string1 = $string.TrimEnd("a", " ")
If I do not specify any characters, the TrimEnd method automatically deletes all Unicode white-space characters from the end of the string. In this example, a string contains both a space and a tab character at the end of the string. The length is 20 characters long. After I trim the end, the length is only 18 characters long, and both the space and the tab are gone. This technique is shown here:
PS C:\> $string = "a string and a tab `t"
PS C:\> $string1 = $string.TrimEnd()
a string and a tab
PS C:\> $string1.length
If I need to trim stuff from the beginning of a string, I use the TrimStart method. It works exactly the same as the TrimEnd method. So it will also work in three ways as follow:
Like the TrimEnd method, I can specify Unicode characters by Unicode code value. When I do this, the TrimStart method deletes those characters from the beginning of a string. This technique is shown here:
PS C:\> $string1 = $string.TrimStart([char]0x0061, [char]0x0020)
Instead of using the Unicode code values, I can simply type the array of string characters I want to delete.
Note One disadvantage of typing specific characters, is that “ “ is kind of hard to interpret. Is it a space, a tab, or a null value? Is it a typing error, or is it intentional? By using a specific Unicode code value, I know exactly what is meant, and therefore the script is more specific.
In this example, I type specific characters that need to be removed from the beginning of the string:
PS C:\> $string1 = $string.TrimStart(" ", "a")
If I call the TrimStart method and do not supply specific characters, the method will simply delete all Unicode white space from the beginning of the string. This technique is shown here:
PS C:\> $string = " String a"
PS C:\> $string1 = $string.TrimStart()
That is all there is to using String methods. This also concludes String Week. Join me tomorrow when I will talk about exploring the Windows PowerShell $Profile variable.
I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at firstname.lastname@example.org, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.
Ed Wilson, Microsoft Scripting Guy
There are many articles that delve into removing a static beginning of a string. How do you remove a variable number of characters? For example, how do you remove the "CN=Computer01," from a string that contains the distinguishedname of a computer like
"CN=Computer01,OU=Employees,DC=Domain,DC=com" so that all that is left is the container the computer resides, i.e., "OU=Employees,DC=Domain,DC=com"?
It think you can get what you want by using regular expressions (regex)
$Computers = "CN=Computer01,OU=Employees,DC=Domain,DC=com",`
$Computers2 = $myregex.split($Computers)
After running this code, the $computers2 variable is an array of the expected informations : CN=xxx, has been removed from each line. The secret here is the regex which contains "CN=\w*," meaning anything beginning with "CN=" followed by any numbers of characters
(\w*) and ending with "," will match.
Lets verify :
Have a nice day
on a case i have a text string coming into a variable, example "Attribute1, Attribute2, Attribute3" any ideas how to trim them in order to obtain this ?
the string on my variable is coming with @ as the separator, but i cant seem to put this together
As usual - great post. Thank you.