Hey, Scripting Guy! Event 1 *Solutions* from Expert Commentators (Beginner and Advanced; 100-meter dash)

Hey, Scripting Guy! Event 1 *Solutions* from Expert Commentators (Beginner and Advanced; 100-meter dash)

  • Comments 4
  • Likes

2009 Summer Scripting Games  

(Note: These solutions were written for Event 1.) 

Beginner Event 1: The 100-Meter Dash

In the 100-meter event, you will be given the finish times of our runners. You will be asked to sort them and rank the gold, silver, and bronze winners.

Guest commentator: Steven Murawski

Steven Murawski is the Director of PowerShellCommunity.Org. His podcast can be heard on mindofroot.com. He maintains a blog at usepowershell.com. Steve agreed to provide a VBScript solution and a Windows PowerShell solution for the first event.

VBScript solution

My first thought after reviewing this problem was that this should not be too difficult. Reading files and looking for patterns in data are common tasks for IT pros of all persuasions. Here is the solution written in VBScript.

BeginnerEvent1Solution.vbs

Const ForReading = 1

Set objFS = CreateObject("Scripting.FileSystemObject")
Set objFile = objFS.OpenTextFile("C:\Scripts\100 Meter Event.txt", ForReading)

Set myRegExp = New RegExp
myRegExp.IgnoreCase = True
myRegExp.Pattern = "^(\w+?,\ .+?)\t(.+?)\s+(\d+\.\d+)$"


Const adVarChar = 200
Const MaxCharacters = 255
Const adFldIsNullable = 32
Const adDouble = 5

Set DataList = CreateObject("ADOR.Recordset")
DataList.Fields.Append "Name", adVarChar, MaxCharacters, adFldIsNullable
DataList.Fields.Append "Country", adVarChar, MaxCharacters, adFldIsNullable
DataList.Fields.Append "Time", adDouble, , adFldIsNullable
DataList.Open

Do Until objFile.AtEndOfStream
          strLine = objFile.ReadLine
          Set colMatches = myRegExp.Execute(strLine)
          For Each Match In colMatches
                   DataList.AddNew
                   DataList("Name") = Match.SubMatches(0)
                   DataList("Country") = Match.SubMatches(1)
                   DataList("Time") = Match.SubMatches(2)
                   DataList.Update
          Next
Loop

'close file
objFile.Close

DataList.Sort = "Time"
Wscript.Echo "Gold Medal: " & DataList.Fields.Item("Name") & " " & DataList.Fields.Item("Country") & " " & DataList.Fields.Item("Time")
DataList.MoveNext

Wscript.Echo "Silver Medal: " & DataList.Fields.Item("Name") & " " & DataList.Fields.Item("Country") & " " & DataList.Fields.Item("Time")
DataList.MoveNext

Wscript.Echo "Bronze Medal: " & DataList.Fields.Item("Name") & " " & DataList.Fields.Item("Country") & " " & DataList.Fields.Item("Time")

Windows PowerShell solution

Here is the solution written in Windows PowerShell.

BeginnerEvent1Solution.ps1

[regex]$regex = "^(?<Name>\w+?,\ .+?)\t(?<Country>.+?)\s+(?<Time>\d+\.\d+)$"
$Name = @{Name='Name';Expression={$_.groups["Name"].Value}}
$Country = @{Name='Country';Expression={$_.groups["Country"].Value}}
$Time = @{Name='Time';Expression={$_.groups["Time"].Value}}

$medals = 'Gold', 'Silver', 'Bronze'

$File = 'c:\scripts\100 Meter Event.txt'
$finalist = get-content $file  | ForEach-Object { $regex.Match($_) } | Where-Object {$_.Success} |
          Select-Object -Property $Name, $Country, $Time | Sort-Object -Property Time | Select-Object -First $Medals.count



for ($i = 0; $i -lt $Medals.count; $i++)
{
     Write-Host "$($Medals[$i]) Medal: $($Finalist[$i].Name) $($Finalist[$i].Country) $($Finalist[$i].Time)"
}

Here is Steven Murawski’s description of the approach he took to unraveling the mysteries of the 100-meter event.

The first thing I did was take a look at the data file and look for patterns in how the data was stored there and in the 100 Meter Event.txt file. The text file contained three categories of information—Name, Country, and Time. Names were stored as “last name, first name”. After the name, there was at least one empty space and then the country. Countries contained both single word names and compound word names. After that, at least one empty space was followed by the time of the runner.

Because there was a pattern to the data, but not a unique breaking character (spaces were in the middle of the country names, there was a comma in the name but nowhere else, and some of the spacing characters were tabs and others were single spaces), I decided to use a regular expression to parse the text.

To read in the file, I created an instance of the FileSystemObject object and used that to open the text file. This is seen here:

Const ForReading = 1
Set objFS = CreateObject("Scripting.FileSystemObject")
Set objFile = objFS.OpenTextFile("C:\Scripts\100 Meter Event.txt", ForReading)

After opening the file, I created my regular expression object:

Set myRegExp = New RegExp
myRegExp.IgnoreCase = True
myRegExp.Pattern = "^(\w+?,\ .+?)\t(.+?)\s+(\d+\.\d+)$"

(For more information on regular expressions, check out MSDN.)

My regular expression will take the first set of letters (or numbers) up to a comma, and a second set of letters (or numbers) up to the first white space and create the first group (the name) from that. The second group (the country) will be any and all characters after the white space following the name up to the white space preceding the time. Finally, the remaining digits and period character are saved as the third group.

To store these values, I created an in-memory dataset and opened it up:

Set DataList = CreateObject("ADOR.Recordset")
DataList.Fields.Append "Name", adVarChar, MaxCharacters, adFldIsNullable
DataList.Fields.Append "Country", adVarChar, MaxCharacters, adFldIsNullable
DataList.Fields.Append "Time", adDouble, , adFldIsNullable
DataList.Open

Next, I looped through the text file, reading each line and matching it against my regular expression. If there was a match, I added the result to the dataset:

Do Until objFile.AtEndOfStream
          strLine = objFile.ReadLine
          Set colMatches = myRegExp.Execute(strLine)
          For Each Match In colMatches
                   DataList.AddNew
                   DataList("Name") = Match.SubMatches(0)
                   DataList("Country") = Match.SubMatches(1)
                   DataList("Time") = Match.SubMatches(2)
                   DataList.Update
          Next
Loop

After creating the dataset, I sorted it based on time:

DataList.Sort = "Time"

Then it was just a matter of returning the top three results and displaying them on the screen:

Wscript.Echo "Gold Medal: " & DataList.Fields.Item("Name") & " " & DataList.Fields.Item("Country") & " " & DataList.Fields.Item("Time")
DataList.MoveNext
Wscript.Echo "Silver Medal: " & DataList.Fields.Item("Name") & " " & DataList.Fields.Item("Country") & " " & DataList.Fields.Item("Time")
DataList.MoveNext
Wscript.Echo "Bronze Medal: " & DataList.Fields.Item("Name") & " " & DataList.Fields.Item("Country") & " " & DataList.Fields.Item("Time")

Running the VBScript script should give you the following:

Image of the output from running the VBScript script

 

Here is the output I obtained when I ran the Windows PowerShell script:

Image of the results of running the Windows PowerShell script

 

Advanced Event 1: The 100-Meter Dash

The 100-meter event is the shortest outdoor distance. In this event, you will be required to read a text file and determine the shortest lines of text that it contains. 

Guest commentator: Kirk Munro

Guest commentator Kirk Munro

Kirk Munro is a PowerShell Solutions Architect and Windows PowerShell MVP. He maintains the Poshoholic.com Web site and tweets on Twitter.com/poshoholic.  

Windows PowerShell solution

Sprint. That is the name of the game for me these days. Lots to do, little time to do it. When the Scripting Guys asked me to provide a solution for the Advanced division of Event 1, I thought it was very appropriate to my work because it’s all about finding the shortest paths to get the job done. In addition, I thought it fitting to provide a nice, short solution to do it too, so let’s get started. First of all, here is the solution.

AdvancedEvent1Solution.ps1

param(
          $LiteralPath = '.\Personal Information Cards_ADV1.txt',
          $Count = 1
)

Get-Content -LiteralPath $LiteralPath `
          | Where-Object {$_.Trim()} `
          | Sort-Object {$_.Trim().Length} `
          | Select-Object -First $Count

At first glance, the problem is straightforward: Read in the contents of a file and determining what the three shortest lines of text contain. When you look a little more closely at the file, though, you’ll quickly realize that you can’t always count on things working as you expect. Some of the lines start with white space, and other lines contain nothing but white space. I do not consider white space to be text, so white space should be ignored—we will have to remember that when writing the solution.

Now that we’ve looked at the file we’re dealing with, the next step is to break down the problem into a set of manageable tasks. For this event, I came up with the following tasks:

·       Read the contents of the text file.

·       Filter out any lines that contain nothing but whitespace.

·       Sort the remaining lines by the length of the text (remembering to strip the white space when determining the length).

·       Show the first three lines (these will be the shortest lines after they are sorted).

This is a manageable list of tasks, so now we need to figure out what Windows PowerShell offers to solve each task. Fortunately, Windows PowerShell comes with an appropriate cmdlet for each of these tasks, making things pretty easy for us. Here are the cmdlets we will be able to use:

·       Get-Content allows you to read the contents of files.

·       Where-Object allows you to filter objects.

·       Sort-Object allows you to sort objects.

·       Select-Object allows you to pick which objects you want.

The only other detail we need to know is that you can use the trim method on strings to eliminate white space and the length property on strings to determine the length. At this point, we have enough details to create our script in our favorite script editor (mine is PowerGUI). Here is what my Get-ShortestLine.ps1 script looks like:

param(
          $LiteralPath = '.\Personal Information Cards_ADV1.txt',
          $Count = 1
)

Get-Content -LiteralPath $LiteralPath `
          | Where-Object {$_.Trim()} `
          | Sort-Object {$_.Trim().Length} `
          | Select-Object -First $Count

I threw a few parameters into the mix to give the script some flexibility, but basically it’s as simple as piping together the cmdlets chosen for each task and using trim where appropriate to prevent white space from skewing our results. Below you can see the results when I ran my script:

Image of the results of running the script

I hope that through this solution I have given you a few Windows PowerShell tips and tricks that you can use in your own scripts.

Guest commentator: Stephanie Peters

Guest commentator Stephanie Peters

Stephanie Peters has worked for Microsoft for more than 10 years. She is a senior premier field engineer and a veteran scripting trainer. She writes a very popular blog on TechNet, Something About Scripting.

VBScript solution

I was never much of a sprinter, actually. I am more of a survey the track, determine the strategy, and work your paces kind of girl. In this event, if you were to try to sprint off writing code at the starting gun, you might find yourself in a bit of trouble. There are a few things that need to be worked out before getting out of the blocks. OK, that’s enough with the play-on-the-track theme.

Seriously, though, there were a number of steps I had to work out before I hit my stride on this one. (Oops, sorry—will not do it again.) 

The scenario sounds simple enough:

“In this event, you will be required to read the Personal Information Cards_ADV1.txt file and determine the shortest lines of text that the file contains. You will need to display to the console, the first three shortest lines.” Here is my solution to this event:

AdvancedEvent1Solution.vbs

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'                                                                     '
' Adv_1.vbs                                                           '
'   written by Stephanie Peters, Microsoft PFE                        '
'   for the 2009 Summer Scripting Games 2009                          '
'                                                                     '
' The goal of this script is to read a particular text file           '
' (Personal Information Cards_ADV1.txt) and display the first three   '
' shortest lines from that file.                                      '
'                                                                     '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Option Explicit

' Constants specify parameters required in the scenario
Const FILE_NAME = "Personal Information Cards_ADV1.txt"
Const N_SHORTEST_LINES = 3

' Parameter Variables
Dim SKIP_EMPTY_LINES
Dim SKIP_WHITESPACE_ONLY
Dim TRIM_LINES
Dim UNIQUE
' Script Variables
Dim objLengthDictionary, objDupeDictionary     
Dim objFSO, objFile                                      
Dim strCurrentPath, strFilePath                
Dim strCurrentLine, intCurrentLineLength, strCurrentLineNumber, strPad
Dim blnSkipThisLine
Dim intMaxLineLength, intMinLineLength
Dim intFileLineCounter, intLengthCheckCounter, intShortLineCounter
Dim aryCurrentLines

' Initialize Parameters
SKIP_EMPTY_LINES = Wscript.Arguments.Named.Exists("SkipEmpty")
SKIP_WHITESPACE_ONLY = Wscript.Arguments.Named.Exists("SkipWhiteLines")
TRIM_LINES = Wscript.Arguments.Named.Exists("Trim")
UNIQUE = Wscript.Arguments.Named.Exists("Unique")

' Set up objects for later use
Set objLengthDictionary = CreateObject("Scripting.Dictionary")
Set objDupeDictionary = CreateObject("Scripting.Dictionary")
Set objFSO = CreateObject("Scripting.FileSystemObject")

' Locate text file in the same folder with the currently running script
strCurrentPath = Replace(Wscript.ScriptFullName,Wscript.ScriptName,"")
strFilePath = strCurrentPath & FILE_NAME

' Open the text file for reading
Set objFile = objFSO.OpenTextFile(strFilePath)

' Read the file one line at a time
Do Until objFile.AtEndOfStream
          ' Keep track of which line we are currently reading
          intFileLineCounter = intFileLineCounter + 1
          strCurrentLine = objFile.ReadLine
          ' Reset blnSkipThisLine to False for new line
          blnSkipThisLine = False

          ' Use Logical implication to determine whether
          ' to skip lines based on white space whitespace
          If _
Not (SKIP_EMPTY_LINES Imp CBool(Len(strCurrentLine))) Then
                   blnSkipThisLine = True
          ElseIf _
Not (SKIP_WHITESPACE_ONLY Imp CBool(Len(Trim(strCurrentLine)))) Then
                   blnSkipThisLine = True
          Else

                   ' Trim leading and trailing whitespace if specified
                   If TRIM_LINES Then
                             strCurrentLine = Trim(strCurrentLine)
                   End If

                   ' Determine whether to skip the line based on uniqueness
' if specified
                   If UNIQUE Then
                             ' If the line is in objDupeDictionary, then it has
' already been encountered and is a duplicate.
                             If objDupeDictionary.Exists(strCurrentLine) Then
                                      blnSkipThisLine = True
                             Else
                                      ' This item is not a duplicate and needs to be
' added to the running list.
                                      objDupeDictionary(strCurrentLine) = Null
                             End If
                   End If                                                                      
          End If            

          ' If a line hasn't been marked to skip, then process it
          If Not blnSkipThisLine Then
                   intCurrentLineLength = Len(strCurrentLine)
                   ' Keep track of max and min line lengths for later use.
                   If intCurrentLineLength > intMaxLineLength Then
                             intMaxLineLength = intCurrentLineLength
                   End If
                   If intCurrentLineLength < intMinLineLength Then
                             intMinLineLength = intCurrentLineLength
                   End If
                   ' Pad the leading zeros on the text file line number so the
' display will be aligned.
                   strPad = String(3-Len(intFileLineCounter), "0")
                   strCurrentLineNumber = strPad & intFileLineCounter
                   ' Prepend the line number to the line and format it for output.
                   strCurrentLine = "Line " & strCurrentLineNumber & ": """ & _
 strCurrentLine & """"
                   ' If this line is a new length, then add it to the dictionary,
                   ' otherwise append it to the existing item for this same length.

                   If objLengthDictionary.Exists(intCurrentLineLength) Then
                             objLengthDictionary(intCurrentLineLength) = _
objLengthDictionary(intCurrentLineLength) & _
vbNewLine & strCurrentLine
                   Else
                             objLengthDictionary(intCurrentLineLength) = strCurrentLine
                   End If
          End If
Loop

' intShortLineCounter tracks which of nth shortest line we're looking for.
' We start at 1 because the first line we're looking for is the 1st shortest
' line.
intShortLineCounter = 1
For intLengthCheckCounter = intMinLineLength To intMaxLineLength
          ' If this length is found in the Dictionary, then we’ll use it.
          If objLengthDictionary.Exists(intLengthCheckCounter) Then
                   ' write the header for this particular grouping
                   wscript.echo
                   wscript.echo "Number " & intShortLineCounter & _
                             " shortest line(s) - Length=" & intLengthCheckCounter & ":"
                   wscript.echo "===================================="
                   ' Empty lines must be handled separately
                   If objLengthDictionary(intLengthCheckCounter)="" Then
                             wscript.echo objLengthDictionary(intLengthCheckCounter)
                             intShortLineCounter = intShortLineCounter + 1
                   Else
                             ' In the event of a tie, the dictionary value will
                             ' contain multiple values, so we need to separate them
                             ' using the Split function
                             aryCurrentLines = _
                                      Split(objLengthDictionary(intLengthCheckCounter), _
                                      vbNewLine)
                             For Each strCurrentLine in aryCurrentLines
                                      wscript.echo strCurrentLine
                                      intShortLineCounter = intShortLineCounter + 1
                             Next
                   End If
                   ' Once the required number of shortest lines have been
' discovered, then we're done.
                   If intShortLineCounter > N_SHORTEST_LINES Then
                             Exit For
                   End If
          End If
Next
wscript.echo
wscript.echo

Preliminary questions

The preliminary questions are way too simple. In fact, these are some of the vaguest specs I have ever seen—and I have seen all sorts of specs. My mind raced to clarifying questions:

·       Should empty lines be displayed?

·       How about lines with only white space characters?

·       Should leading and trailing white space be trimmed from the lines before calculating the length?

·       If there are two identical lines, should they be handled separately or aggregated into unique lines only?

(Scripting Guys Note: The vague specs were intentional. It is part of what makes an advanced Scripting Games scenario more fun. It also gives you the flexibility to make choices and decisions for yourself. Because you are helping to decide upon the specifications for the scenario, you are the one who is in charge of your own destiny, and ultimately the one who will decide if you met or exceeded your personal goal.) 

Of course, I didn’t have much luck getting these questions answered—a situation I’ve been in before. That being the case, I usually find it best to write into the script the flexibility to handle as many scenarios as reasonably possible and let the user choose how he or she would like to go forward. I decided early on to allow the user to specify how to handle these options.

Some other specification questions I had to answer for myself included the following:

·       How should I handle a tie?

·       What is the best way to display the lines in a way that makes the essence of the result clear?

Strategy

You’ll see how I handled those a little later. After I had decided these particulars, I worked out a strategy I could execute. The first idea I had was that I should sort the lines based on length. However, the scenario makes no mention of having to do a full sort, so I decided I would do the following:

·       Read each line of the file one by one.

·       Make a decision as to if/how the line should be processed.

·       Store the line in a Scripting.Dictionary object—keyed on the length of the line.

·       Iterate possible lengths, starting with the shortest, and retrieve them from the Dictionary object until the required first three shortest line lengths is met.

·       Display a header for each line length.

·       Display each line that corresponds to that length enclosed in double quotation marks (to make potential white space apparent) and preceded by its line number in the file.

With strategy in hand, it’s off to the, uh, scripting. For this script, we are following the four-part scripting model that was introduced in the Microsoft Press book, Microsoft Windows Scripting Self-Paced Learning Guide. 

Header

Last things first: the script header. It’s important, but honestly, it was the last thing I finished:

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'                                                                     '
' Adv_1.vbs                                                           '
'   written by Stephanie Peters, Microsoft PFE                        '
'   for the 2009 Summer Scripting Games                               '
'                                                                     '
' The goal of this script is to read a particular text file           '
' (Personal Information Cards_ADV1.txt) and display the three         '
' shortest lines from that file.                                      '
'                                                                     '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Option Explicit

' Constants specify parameters required in the scenario
Const FILE_NAME = "Personal Information Cards_ADV1.txt"
Const N_SHORTEST_LINES = 3

' Parameter Variables
Dim SKIP_EMPTY_LINES
Dim SKIP_WHITESPACE_ONLY
Dim TRIM_LINES
Dim UNIQUE
' Script Variables
Dim objLengthDictionary, objDupeDictionary     
Dim objFSO, objFile                                      
Dim strCurrentPath, strFilePath                
Dim strCurrentLine, intCurrentLineLength, strCurrentLineNumber, strPad
Dim blnSkipThisLine
Dim intMaxLineLength, intMinLineLength
Dim intFileLineCounter, intLengthCheckCounter, intShortLineCounter
Dim aryCurrentLines

Reference

With mandatory housekeeping out of the way, it’s time to set up some reference items for later use. First, we read the arguments I mentioned earlier to provide flexibility in the way the user runs the script. The argument values are stored in the four variables you see below. These can be provided as command-line arguments to the script, as we will see later.

' Initialize Parameters
SKIP_EMPTY_LINES = Wscript.Arguments.Named.Exists("SkipEmpty")
SKIP_WHITESPACE_ONLY = Wscript.Arguments.Named.Exists("SkipWhiteLines")
TRIM_LINES = Wscript.Arguments.Named.Exists("Trim")
UNIQUE = Wscript.Arguments.Named.Exists("Unique")

 

Second, we’ll initialize some COM objects—two Dictionaries and a FileSystemObject—both of which are included with the Windows Scripting Host, so they will be available on the Scripting Games test computer. Seasoned VBScripters will be familiar with the Dictionary object, but others may not be. The Dictionary holds a dynamic collection of key/value pairs and is good for a couple of things we’re going to need in this script: (1) removing duplicate items from a list, and (2) generating a quick index of values.

The FileSystemObject object is fairly straightforward. It’s going to allow us to read the text file from the file system:

' Set up objects for later use
Set objLengthDictionary = CreateObject("Scripting.Dictionary")
Set objDupeDictionary = CreateObject("Scripting.Dictionary")
Set objFSO = CreateObject("Scripting.FileSystemObject")

 

Before we open the “Personal Information Cards_ADV1.txt" file for reading, we have to locate it.  I could have hard-coded the full path to it, but then it wouldn’t necessarily work for anyone else—especially because the file is currently in My Documents. We can use the ScriptFullName and ScriptName properties of the Wscript object to deduce the parent folder of the currently running script. This means that the text file and the script have to be in the same folder:

' Locate text file in the same folder with the currently running script
strCurrentPath = Replace(Wscript.ScriptFullName,Wscript.ScriptName,"")
strFilePath = strCurrentPath & FILE_NAME

 

Worker

Now comes the processing of the text file—reading it one line at a time.

' Open the text file for reading
Set objFile = objFSO.OpenTextFile(strFilePath)

' Read the file one line at a time
Do Until objFile.AtEndOfStream
          ' Keep track of which line we are currently reading
          intFileLineCounter = intFileLineCounter + 1
          strCurrentLine = objFile.ReadLine
          ' Reset blnSkipThisLine to False for new line
          blnSkipThisLine = False

 

Here I need to point out that I’ve been waiting about 12 years to use the logical implications (Imp) operator. I never found the occasion to use it, and I’m not sure why it struck me to use it now, but it perfectly suits what I need it to do—almost. Why have I never used it before? Fair warning: If logical operations make your head hurt, take my word for it and move on to the code.

In a logical implication, if the first value is False the result is always True, but if the first value is True, the result might be True depending on the second value. (See MSDN for a complete result chart for the Imp operator.)  

 

Expression1

Expression2

Result

True

True

True

True

False

False

False

True

True

False

False

True

 

 

In our case, we actually need the negative of this logic. For instance, we have the Boolean setting SKIP_EMPTY_LINES. We can determine whether or not the current line is empty by using the expression CBool(Len(strCurrentLine)). The result is the value we would apply to the flag blnSkipThisLine to determine whether the current line should be skipped.

The result we want is as follows:

 

SKIP_EMPTY_LINES

CBool(Len(strCurrentLine))

blnSkipThisLine

True

True

False

True

False

True

False

True

False

False

False

False

 

 

This is the negative of the logical implication, which means we can pair it together with the logical negation operator Not to get the results we need. The same logic goes for SKIP_WHITESPACE_ONLY, except that we are using Trim () to remove the white space before checking the length.

I could have done this a different way, but when you get a chance to use something you have had your eye on for a long time, you just do it. (If I could think of a reason to use the StrReverse function, I think I would have used the whole language reference!)

          ' Use Logical implication to determine whether
          ' to skip lines based on whitespace whitespace
          If Not (SKIP_EMPTY_LINES Imp CBool(Len(strCurrentLine))) Then
                   blnSkipThisLine =
True
          ElseIf _
Not (SKIP_WHITESPACE_ONLY Imp CBool(Len(Trim(strCurrentLine))))
Then
                   blnSkipThisLine =
True
          Else

 

Now that we know that either the SKIP- settings weren’t specified or the line actually contains non-white space characters, we can go about processing them. The next setting to check is whether or not the user wanted to trim leading and trailing white space from the lines before calculating their length: 

                   ' Trim leading and trailing whitespace if specified
                    If TRIM_LINES Then
                             strCurrentLine = Trim(strCurrentLine)
                   End If

 

We might need to check that the values are unique based on the value of UNIQUE, which was passed in (or not) from the command line.

To remove duplicate values in VBScript, I always like to use a Dictionary object. In fact, 99 percent of the time when I use a Dictionary object, it’s for this purpose. In this case, we add the current line of text as a key in the Dictionary, and then we can check whether that item has been added before by using the Exists method. Because it’s the key we’re interested in, it doesn’t matter what value you assign to it, so I’m using Null.

                   ' Determine whether to skip the line based on uniqueness
                   ' if specified
                   If UNIQUE Then
                             ' If the line is in objDupeDictionary, then it has
                             ' already been encountered and is a duplicate.
                             If objDupeDictionary.Exists(strCurrentLine)
Then
                                      blnSkipThisLine = True
                            
Else
                                      ' This item is not a duplicate and needs to be
                                      ' added to the running list.
                                      objDupeDictionary(strCurrentLine) = Null
                            
End If
                   End If                                                                      
          End If            

 

At this point, we have definitively determined whether or not the current line should be skipped. If not, we’ll go ahead and process it. Meaning that we’ll determine the length of the line (keeping track of min and max lengths so that we’ll know what range we’ve accumulated), and store the line information in another Dictionary object, which is keyed on the length of the line:

          ' If a line hasn't been marked to skip, then process it
          If Not blnSkipThisLine
Then
                   intCurrentLineLength = Len(strCurrentLine)
                   ' Keep track of max and min line lengths for later use.
                   If intCurrentLineLength > intMaxLineLength Then
                             intMaxLineLength = intCurrentLineLength
                    End If
                   If intCurrentLineLength < intMinLineLength Then
                             intMinLineLength = intCurrentLineLength
                  
End If
                   ' Pad the leading zeros on the text file line number so the
                   ' display will be aligned.
                   strPad = String(3-Len(intFileLineCounter), "0")
                   strCurrentLineNumber = strPad & intFileLineCounter
                   ' Prepend the line number to the line and format it for output.
                   strCurrentLine = "Line " & strCurrentLineNumber & ": """ & _
                             strCurrentLine & """"
                   ' If this line is a new length, then add it to the dictionary,
                   ' otherwise append it to the existing item for this same length.

 

If the current line length has already been keyed, we’ll append the new line to whatever lines were already determined to have the same length. Otherwise, we’ll add the new length key and assign the current line to it:

                   If objLengthDictionary.Exists(intCurrentLineLength) Then
                             objLengthDictionary(intCurrentLineLength) = _
                                      objLengthDictionary(intCurrentLineLength) & _
                                      vbNewLine & strCurrentLine
                   Else
                             objLengthDictionary(intCurrentLineLength) = strCurrentLine
                   End If
          End If
Loop

 

After this, you might (depending on the arguments passed) have a Dictionary that looks something like this:

 

Key

Value

40

Line 001: "Understanding Personal Information Cards"

Line 011: "On the Start menu, click Control Panel."

Line 013: "Double click the Windows CardSpace icon."

421

Line 002: "CardSpace provides users the ability to access,…

0

Line 003: ""

26

Line 004: "Personal Information Cards"

805

Line 005: "Personal Information Cards (also called Self-…

36

Line 007: "Creating a Personal Information Card"

77

Line 008: "To create a Personal Information Card, start the…

 

 

 

This isn’t technically a sort, but it’s as close to one as we need for the purpose of this scenario, because now we can start at the minimum length we recorded and check every increment from there until we get the three shortest lines. After that, we don’t really care about the order of the rest of the lines.

The output

This brings us to the output section of the script:

' intShortLineCounter tracks which of nth shortest line we're looking for.
' We start at 1 because the first line we're looking for is the 1st shortest
' line.
intShortLineCounter = 1
For intLengthCheckCounter = intMinLineLength To intMaxLineLength
          ' If this length is found in the Dictionary, then we’ll use it.
          If objLengthDictionary.Exists(intLengthCheckCounter)
Then
                   ' write the header for this particular grouping
                   wscript.echo
                   wscript.echo "Number " & intShortLineCounter & _
                             " shortest line(s) - Length=" & intLengthCheckCounter & ":"
                   wscript.echo "===================================="
                   ' Empty lines must be handled separately
                   If objLengthDictionary(intLengthCheckCounter)="" Then
                             wscript.echo objLengthDictionary(intLengthCheckCounter)
                             intShortLineCounter = intShortLineCounter + 1
                   Else
                             ' In the event of a tie, the dictionary value will
                             ' contain multiple values, so we need to separate them
                             ' using the Split function
                             aryCurrentLines = _
                                      Split(objLengthDictionary(intLengthCheckCounter), _
                                      vbNewLine)
                             For Each strCurrentLine in aryCurrentLines
                                      wscript.echo strCurrentLine
                                      intShortLineCounter = intShortLineCounter + 1
                             Next
                  
End If
                   ' Once the required number of shortest lines have been
                   ' discovered, then we're done.
                   If intShortLineCounter > N_SHORTEST_LINES Then
                            
Exit For
                  
End If
         
End If
Next

 

It’s time to show what the output of the script is when you call it with the various arguments. The first one has no arguments, so it doesn’t skip any lines and also includes duplicates. Accordingly, there is a 16-way tie for the shortest line, and all of those lines have a length of zero:

CMD> cscript .\Adv_1.vbs //nologo

Number 1 shortest line(s) - Length=0:
====================================
Line 003: ""
Line 006: ""
Line 009: ""
Line 012: ""
Line 014: ""
Line 016: ""
Line 018: ""
Line 020: ""
Line 022: ""
Line 024: ""
Line 026: ""
Line 029: ""
Line 076: ""
Line 079: ""
Line 081: ""
Line 083: ""

When we add the /unique switch, the empty lines are aggregated and only the first empty line number is shown:

CMD> cscript .\Adv_1.vbs /unique //nologo

Number 1 shortest line(s) - Length=0:
====================================
Line 003: ""

Number 2 shortest line(s) - Length=1:
====================================
Line 033: " "

Number 3 shortest line(s) - Length=4:
====================================
Line 067: "PPID"

 

When we add the /trim switch, leading and trailing spaces are ignored, which also causes the single-space lines to be aggregated with the zero-length lines. Here, we have a three-way tie for third place:

CMD> cscript .\Adv_1.vbs /unique /trim //nologo

Number 1 shortest line(s) - Length=0:
====================================
Line 003: ""

Number 2 shortest line(s) - Length=4:
====================================
Line 067: "PPID"

Number 3 shortest line(s) - Length=6:
====================================
Line 027: "Claims"
Line 037: "Street"
Line 064: "Gender"

 

The skipwhitelines switch will remove all empty or effectively empty lines from the output, making line 67 “PPID” the shortest line—followed by a three-way tie for second. Because four lines are accounted for, there is no reason to move beyond second place.

 

CMD> cscript .\Adv_1.vbs /unique /trim /skipwhitelines //nologo

Number 1 shortest line(s) - Length=4:
====================================
Line 067: "PPID"

Number 2 shortest line(s) - Length=6:
====================================
Line 027: "Claims"
Line 037: "Street"
Line 064: "Gender"

Confession time

Whew! That was a lot of code, and I’m tired. To be honest, I admit I would normally never have done this kind of heavy lifting with VBScript, which is so handicapped in the areas of filtering and sorting. Typically, I would just use the shell to execute the one or two lines of Windows PowerShell that it would take to do the exact same thing. This being the Scripting Games, though, it was a fun exercise, and I’m pretty happy with the result. Here is what the results look like when the script completes running.

Image of results of running script

 

Awesome work Steven, Kirk, and Stephanie! What a terrific way to conclude the first day of the Summer Scripting Games 2009. Join us tomorrow as we reveal Event 7. Tomorrow, we will also have solutions from another stellar group of commentators for Event 2—the long jump. For all the latest Scripting Games information, follow us on Twitter. See you tomorrow.


Ed Wilson and Craig Liebendorfer, Scripting Guys

 

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • All the Scripting Games links in one location! Let the learning begin. Review Submitted Scripts Event

  • Dear Scripting Guys,

    I just worked a little bit through the description of event 1 and read the solutions here :-)

    Wonderful solutions, and I wonder, if I should add another, less brilliant, one to these!

    One note to Steven's solution:

    I would change the

    [regex]$regex = "^(?<Name>\w+?,\ \w+?)\s+(?<Country>.+?)\s+(?<Time>\d+\.\d+)$"

    to

    [regex]$regex = "^(?<Name>[^\t]+)\t(?<Country>[^\t]+)\t(?<Time>\d+\.\d+)$"

    Why? Not, because this one is shorter :-) it is a little bit "more correct" because you can't decide whether

    Hansen, Anne Grethe Japan 8.85

    is a two components first name or you have a country with two components like in:

    Pfeiffer, Michael United States 8.85

    In fact I thought that this problem couldn't be solved at first sight, but a close look into the text file revealed, that the three components are seperated by TAB characters, which makes it well possible to seperate the three parts correctly!

    kind regards, Klaus

  • Klaus,

    Right you are.  I actually caught that after I had submitted my solution a while ago and wanted to change my regexs to

    PowerShell -

    "^(?<Name>\w+?,\ .+?)\t(?<Country>.+?)\s+(?<Time>\d+\.\d+)$"

    and Vbscript should be

    "^(\w+?,\ .+?)\t(.+?)\s+(\d+\.\d+)$"

    but it was a bit too late.

  • Hi,

    what a great and simple solution for the advanced Powershell. The output will look a bit strange if the lines which are starting with a white-space gets enumerated.

    To prevent this, I put the text file in an array and TRIM'ed away the white space for each entry. Output and sorted for the rest just like you did.

    regards,

    Patrik