Hey, Scripting Guy! Question

Hey Scripting Guy! I have a log file that is produced by an application. The file is simply text, and is not really formatted in any discernable way. I can open the log in Notepad and use the <ctrl-f> (control –f) command to search the text file for specific things, but it would be much nicer if I had a different way to work with the data in the file.

-- AM

Hey, Scripting Guy! Answer

Hello AM,

Microsoft Scripting guy Ed Wilson here. It is a warm (ok, downright hot, and I am engaging in wishful thinking) afternoon as I recline at my desk, and look out at the dark blue sky with fluffy white clouds that drift lazily across the humidity-laden sky. It is not time yet time for my afternoon tea, and it is too far past lunch to engage in flights of revelry and imagination. I am therefore trying to catch up on some of the e-mail that is currently clogging the scripter@microsoft.com inbox. Your question tripped my wayback switch and took me to a point in time when I was just entering the world of business and had my first desk. It was a massive gray metal monstrosity that had gate hinges and padlocks to keep prying eyes out of the bottom drawers, but it was MY DESK and I was proud to be its owner. The coolest thing about that old desk was not the electric typewriter that sat on the secretary’s wing, but the large two feet square desk calendar. This desk calendar covered a significant portion of the desk (hence the name) and was marked off by squares for each day of the current month. I used to write appointments, notes from phone conversations, outline ideas for process improvement, telephone numbersin short, everything I needed to remember during the month was logged onto that huge sheet of paper. At the end of the month, I would fold the calendar page and stuff it into a file folder in my padlocked bottom drawer.

The giant desk calendar was a great idea, and was an easy technology to implement, but its ease of use was trumped by the lack of certain features, most notably missing was search. The desk calendar contained a tremendous amount of valuable information, but it was very difficult to retrieve the information when I needed it. Your text file is very similar to my giant desk calendar: It is easy to use, but it is hampered by the difficulty in returning useful information. Windows PowerShell has a couple of tools that could aid you.

This week we will be reviewing some of the scripts that were submitted during the recently held 2009 Summer Scripting Games. The description of the 2009 Summer Scripting Games details all of the events. Each of the events was answered by a globally recognized expert in the field. There were some cool prizes and winners were recognized from around the world. Additionally, just like at the "real Olympics" because there was a lot going on, an "if you get lost page" was created. Communication with participants was maintained via Twitter, Facebook, and a special forum. (The special forum has been taken down, but Twitter and Facebook are still used to communicate with Hey, Scripting Guy! fans). We will be focusing on solutions that used Windows PowerShell. We have several good introduction to Windows PowerShell Hey, Scripting Guy! articles that you will find helpful.

The easiest way to return information from a text file is to use the Select-String cmdlet. If you have a text file such as the one test.txt file found in the Competitor’s Pack, and you want to see if it has the word vbcrlf in it, you might use the Get-Content command to inspect it. This is seen here:

PS C:\> Get-Content -Path C:\data\fso\test.txt
This is a test file.
It is used to test stuff.
It has a vbcrlf at the end of each line.
This should make putting the text file together easy.
If there are not any cr, then the end of the line marker is not there.
This can cause problems when creating a file from comptuer data.
This would require the use of regex patterns i imagine.

But suppose you only want to know if the word vbcrlf exists in the file? You can use the Select-String cmdlet to both read the file and display the matching line. This is seen here:

PS C:\> Select-String -Path C:\data\FSO\test.txt -Pattern vbcrlf

data\FSO\test.txt:3:It has a vbcrlf at the end of each line.

One thing that is very powerful about the Select-String cmdlet is that it will use a wild card  in the path and therefore can return all matches of a specified pattern in all files. This is seen here:

PS C:\> Select-String -Path C:\data\FSO\*.txt -Pattern vbcrlf

data\FSO\test(2).txt:3:It has a vbcrlf at the end of each line.
data\FSO\test.txt:3:It has a vbcrlf at the end of each line.

If you need to perform a task that is more sophisticated, you will more than likely need to resort to using regular expressions. Regular expressions are very powerful, but with great power comes great complexity. On the Script Center, several Hey, Scripting Guy! articles have been written that talk about using regular expressions from VBScript as well as from within Windows PowerShell. 

The 2009 Summer Scripting Games Advanced Event 6 was the 110-meter hurdles event. In that task you were asked to parse a trace route log file, which was just text and therefore entailed using regular expressions to parse it. ChristianJost submitted a really cool script that parsed the network trace log file. His script is seen here. 

ScriptingGamesAdvancedEvent6.ps1

[regex]$RegexSplit = "\s{2,}"

$TracertInformations = @()
Get-Content 'C:\HSG_8_17_09\Network Trace_Adv6.txt'|
where {$_ -match "^\d|^\s{2}\d"}|%{$_.TrimStart("")}|
%{
    $Time1 = ""
    $Time2 = ""
    $Time3 = ""
    $Trace = New-Object System.Object
    $Trace |
       Add-Member -type NoteProperty -name Line -value ($RegexSplit.split("$_"))[0]
    # Replace times <1ms with0
    # Replace * characters with a value of 999999999 (Destination unreachable)

    if ((($RegexSplit.split("$_"))[1]).split()[0] -match "<1") {[int]$Time1 = 0}
    elseif ((($RegexSplit.split("$_"))[1]).split()[0] -match "\*") {[string] $Time1 = "*"}
    else {[int]$Time1 = (($RegexSplit.split("$_"))[1]).split()[0]}

    $Trace |
      Add-Member -type NoteProperty -name Time1 -value $Time1
    # Replace times <1ms with0
    # Replace * characters with a value of 999999999 (Destination unreachable)

    if ((($RegexSplit.split("$_"))[1]).split()[0] -match "<1") {[int]$Time2 = 0}
    elseif ((($RegexSplit.split("$_"))[1]).split()[0] -match "\*") {[string]$Time2 = "*"}
    else {[int]$Time2 = (($RegexSplit.split("$_"))[1]).split()[0]}

    $Trace |
      Add-Member -type NoteProperty -name Time2 -value $Time2
    # Replace times <1ms with0
    # Replace * characters with a value of 999999999 (Destination unreachable)

    if ((($RegexSplit.split("$_"))[1]).split()[0] -match "<1") {[int]$Time3 = 0}
    elseif ((($RegexSplit.split("$_"))[1]).split()[0] -match "\*") {[string]$Time3 = "*"}
    else {[int]$Time3 = (($RegexSplit.split("$_"))[1]).split()[0]}

    $Trace |
       Add-Member -type NoteProperty -name Time3 -value $Time3
   
    $Trace |
      Add-Member -type NoteProperty -name Host -value ($RegexSplit.split("$_"))[4]
    $TracertInformations += $Trace
}



$TracertInformations |
sort-object -property time1 |
Select-object -last 1 |
format-table –auto

The first thing that is done is to create a regular expression pattern that looks for any white space (\s) that occurs at least two or more times, {2,}. An example of using this pattern is seen here:

PS C:\> [regex]$RegexSplit = "\s{2,}"

PS C:\> "this is a string" -match $RegexSplit

False

PS C:\> "this is a string with two  spaces" -match $RegexSplit

True

PS C:\> "this is a string with three   spaces" -match $RegexSplit

True

PS C:\> "this is a string with`t atab" -match $RegexSplit

True

The [regex] is a type accelerator that casts the string into a regular expression. This is seen here:

[regex]$RegexSplit = "\s{2,}"

The content of the file is read by using the Get-Content cmdlet and the results are passed to the Where-Object cmdlet where a regular expression pattern is used to search the contents of the file. The contents of the network Trace_Adv6.text file are seen here.

PS C:\> Get-Content 'C:\HSG_8_17_09\Network Trace_Adv6.txt'

C:\Documents and Settings\edwilson>tracert r60.ntwk.nwtraders.com

 

Tracing route to r60.ntwk.nwtraders.com [192.168.236.9]

over a maximum of 30 hops:

 

  1    <1 ms    <1 ms    <1 ms  r10.ntwk.nwtraders.com [192.168.227.82]

  2    <1 ms    <1 ms    <1 ms  r20.ntwk.nwtraders.com [192.168.169.1]

  3    <1 ms    <1 ms    <1 ms  r50.ntwk.nwtraders.com [192.226.42.47]

  4    12 ms    14 ms    12 ms  r70.ntwk.nwtraders.com [192.226.42.12]

  5    34 ms    34 ms    34 ms  r12.ntwk.nwtraders.com [192.226.42.9]

  6    34 ms    34 ms    34 ms  r40.ntwk.nwtraders.com [192.226.42.22]

  7   104 ms   104 ms   104 ms  r22.ntwk.nwtraders.com [192.226.34.80]

  8   105 ms   105 ms   105 ms  r32.ntwk.nwtraders.com [192.226.47.10]

  9   125 ms   125 ms   125 ms  r35.ntwk.nwtraders.com [192.226.41.198]

10   174 ms   173 ms   173 ms  r37.ntwk.nwtraders.com [192.226.38.14]

11   173 ms   173 ms   173 ms  r38.ntwk.nwtraders.com [192.226.226.119]

12     *        *        *     Request timed out.

13   389 ms   389 ms   389 ms  192.168.216.5

14   390 ms   390 ms   390 ms  r60.ntwk.nwtraders.com [192.168.236.9]

 

Trace complete.

 

C:\Documents and Settings\edwilson>

PS C:\>

As you can see, the file is not formatted. However, each hop begins with a number, ^\d, or two spaces and a number, |^\s{2}\d, so the regular expression pattern is used to remove all but the information that is returned by the tracert command. The percent character (%) is an alias for the ForEach-Object cmdlet. The trimstart string method is used to remove any spaces from the beginning of each line. The resulting matches are piped to the next command. This is seen here:

Get-Content 'C:\HSG_8_17_09\Network Trace_Adv6.txt'|
where {$_ -match "^\d|^\s{2}\d"}|%{$_.TrimStart("")}|

The ForEach-Object cmdlet is used again to work with each object that comes across the pipeline. Some empty variables are created as well as an instance of a system.object. The Add-Member cmdlet is used to add a noteproperty to the system.object that is stored in the $trace variable. The first element in the array that is created by using the split method is added to the line property of the $trace object. This is seen here:

%{
    $Time1 = ""
    $Time2 = ""
    $Time3 = ""
    $Trace = New-Object System.Object
    $Trace |
       Add-Member -type NoteProperty -name Line -value ($RegexSplit.split("$_"))[0]

If the value of the first element is a string that matches less than 1 (<1) the number 0 is assigned to the $time1 variable. This line is a bit confusing because you are dealing with a string and not an actual number. Therefore, you are looking for the string “<1” in the first position. If that “<1” is not found in the first position, an asterisk is looked for. Because the asterisk is a special character in the .NET Framework implementation of regular expressions, it must be escaped by using a back slash (“\*”) so that the regular expression engine knows you are looking for an exact match of the asterisk character. If the asterisk character is found, the * character is assigned to the $time1 variable. If neither of these cases match, the value stored in the first position is assigned to the $time1 variable. The Add-Member cmdlet is used to add the value stored in the $Time1 variable to the Time1 property of the $trace object. This is seen here:

    if ((($RegexSplit.split("$_"))[1]).split()[0] -match "<1") {[int]$Time1 = 0}
    elseif ((($RegexSplit.split("$_"))[1]).split()[0] -match "\*") {[string] $Time1 = "*"}
    else {[int]$Time1 = (($RegexSplit.split("$_"))[1]).split()[0]}

    $Trace |
      Add-Member -type NoteProperty -name Time1 -value $Time1

The same command is repeated three times to assign values to the time1, time2, and time3 properties on the $trace object. The value in the fifth position of the array is added to the host property. This is seen here:

    $Trace |
      Add-Member -type NoteProperty -name Host -value ($RegexSplit.split("$_"))[4]
    $TracertInformations += $Trace

After the $trace object has been created, the objects are piped to the Sort-Object cmdlet where they are sorted on the time1 property. Next the last object is selected, and it is formatted  by using the Format-Table cmdlet:

$TracertInformations |
sort-object -property time1 |
Select-object -last 1 |
format-table –auto

When the script is run, the following output is displayed:

Image of the output of the script


AM, thank you for your question. ChristianJost we appreciate your awesome submission.

If you want to be the first to know what is happening on the Script Center, follow us on Twitter or on Facebook. If you need assistance with a script, you can post questions on the Official Scripting Guys Forum, or send an e-mail to scripter@microsoft.com. The 2009 Summer Scripting Games wrap-up will continue tomorrow. Until then, peace.


Ed Wilson and Craig Liebendorfer, Scripting Guys