Hey, Scripting Guy! How Can I Sort the Contents of a Text File in Numerical Order?

Hey, Scripting Guy! How Can I Sort the Contents of a Text File in Numerical Order?

  • Comments 1
  • Likes

Hey, Scripting Guy! Question

Hey, Scripting Guy! I have a text file in which each line begins with a number. How can I sort the contents of that file by those numbers?

-- KT

SpacerHey, Scripting Guy! AnswerScript Center

Hey, KT. Tell you what: we’ll go ahead and answer your question. However, in return, you have to promise to forget everything we tell you. (Which is usually not a problem for most people.) What do you say: do we have a deal?

Admittedly, that’s kind of an odd bargain we’re trying to strike: we agree to answer someone’s question on the condition that they immediately forget the answer. Why would we try to make a deal like that? Well, to tell you the truth, our first thought upon reading this question was, “Wow. That would make a good Scripting Games event. In fact, we should use this in the 2009 Games.” The only problem, of course, is that we can’t provide the answer today, then try and use this question as a Scripting Games event tomorrow. Talk about a quandary. On the one hand, we need everyone to read today’s column; after all, if nobody reads the column then there’s not much use for the Scripting Guy who writes this column, is there?

Note. Yes, there are plenty of people – including his fellow Scripting Guys – who wonder if there’s ever any need for the Scripting Guy who writes this column.

On the other hand, we need you to forget both the question and the answer; that way we can surprise everyone by using this query as an event in the 2009 Scripting Games. (The events have already been finalized for the 2008 Games, February 15th through March 3rd right here in the Script Center.)

To make a long story short, that’s why we need everyone to – you know what? Never mind; go ahead and remember both the question and the answer if you want to. After all, the odds are pretty good that even if you upheld your end of the bargain the Scripting Guys would forget all about it and wouldn’t use it as an event in the 2009 Scripting Games anyway.

Sadly, forgetfulness is a part of growing old. For some reason, that’s something we can never remember to tell the other Scripting Guys.

Fortunately, though, we did remember to wear pants this morning (we’d rather not talk about that) and to write a script that can sort the lines in a text file based on a number that appears at the beginning of each line. As a matter of fact, we remembered to write this script:

Const ForReading = 1
Const ForWriting =2

Const adVarChar = 200
Const MaxCharacters = 255
Const adFldIsNullable = 32
Const adInteger = 3

Set DataList = CreateObject("ADOR.Recordset")
DataList.Fields.Append "TextLine", adVarChar, MaxCharacters, adFldIsNullable
DataList.Fields.Append "NumberLine", adInteger, , adFldIsNullable
DataList.Open

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading)

Do Until objFile.AtEndOfStream
    strLine = objFile.ReadLine
    intLength = Len(strLine) 
    strValue = ""
    For i = 1 to intLength
        If IsNumeric(Mid(strLine, i, 1)) Then
            strValue = strValue & Mid(strLine, i, 1)
        Else
            Exit For
        End If
    Next
    DataList.AddNew
    DataList("TextLine") = strLine
    DataList("NumberLine") = strValue
    DataList.Update
Loop

objFile.Close

Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForWriting)

DataList.Sort = "NumberLine ASC"

DataList.MoveFirst

Do Until DataList.EOF
    objFile.WriteLine DataList.Fields.Item("TextLine") 
    DataList.MoveNext
Loop

objFile.Close

Before we go much further we should note that this is actually a slightly-more complicated script than KT originally asked for. Why did we give him more than he asked for? As most of
you know, sorting data in VBScript is never much fun; that’s because VBScript has no built-in mechanism for sorting data. On top of that, even when you can sort data you run into the
problem of whether VBScript sees your data as being composed of numbers or letters. For example, suppose we have the following values:

2 BBB
1 AAA
10 DDD
3 CCC

If VBScript sees these values as being numbers, then it will sort the values like this:

1 AAA
2 BBB
3 CCC
10 DDD

However, if VBScript thinks it’s dealing with string values, then we’re going to get back a list like this one:

1 AAA
10 DDD
2 BBB
3 CCC

Yuck.

In KT’s case this isn’t a problem; that’s because the numbers in his text file only go up to 9. As he noted, he needs to sort the lines in the file “based on
the value of the first character in the line, which is always a number.” If you’re only sorting on the first character, well, there aren’t many single-digit numbers
greater than 9.

Well, OK, math fans: there aren’t many base-10 single-digit numbers greater than 9.

We have a feeling, however, that many of you have text files that include line numbers greater than 9; in that case, sorting on just the first digit of a number
like 147 doesn’t do you a lot of good. Therefore, we came up with a script that’s a tiny bit more complicated, but way more flexible: it should work in any
situation and with any numbers.

But don’t worry; if do you want a script that only works with values up to 9, well, we’ve tacked that on to the end of today’s column.

As for the fancier and slightly-more complicated script, we start out by defining a bunch of constants. The first two – ForReading and ForWriting – are needed
when we read from and write to the text file. The other four fill the following roles:

  • adVarChar. This constant enables us to create a recordset field that uses the Variant data type. This field will hold the actual text of each line in the text
    file.

  • MaxCharacters. This constant enables us to put as many as 255 characters in that Variant field.

  • adFldIsNullable. This constant allows us to have Null values in a field. We don’t really need it for today’s script, but it’s a handy thing to know about. (And
    putting it in doesn’t hurt anything.)

  • adInteger. This constant lets us create a recordset field that uses the Integer data type. We’ll use this field to keep track of the numbers that appear at
    the start of each line.

Our next task is to create a “disconnected recordset,” which is – for all intents and purposes – a database table that exists only in memory (that is, it’s not tied
to a physical database). We aren’t going to discuss disconnected recordsets in any detail today; if you’d like more information, you might take a look at
this section of the Microsoft Windows 2000 Scripting Guide. For now, all we’ll do is point out that the following chunk of code creates a new disconnected
recordset, adds a variant field named TextLine and an integer field named NumberLine, and then opens the recordset for use:

Set DataList = CreateObject("ADOR.Recordset")
DataList.Fields.Append "TextLine", adVarChar, MaxCharacters, adFldIsNullable
DataList.Fields.Append "NumberLine", adInteger, , adFldIsNullable
DataList.Open

As soon as we have our recordset we can go ahead open the text file C:\Scripts\Test.txt for reading; that’s what these two lines of code are for:

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading)

Once the file is open, we then set up a Do loop designed to run until we’ve read every line in the file (in other words, until the file’s AtEndOfStream property
is True):

Do Until objFile.AtEndOfStream

As you might expect, all the excitement (well, most of it, anyway) takes place inside this Do loop. To begin with, we use the ReadLine method to read the first
line in the text file, storing that value in a variable named strLine. We then use VBScript’s Len function to determine the total number of characters in that
line:

intLength = Len(strLine)

Why do we need to know that? Well, we know that the first x characters in the line are numbers; however, we have no idea what x is equal to. Therefore,
we’re going to examine each line character-by-character, not stopping until we encounter a non-numeric value. For example, suppose the first line in our text
file looks like this:

124This is a line in the text file.

We’ll grab the 1, the 2, and the 4, and won’t stop grabbing characters until we reach the T, the first non-numeric value.

So how do we do all that? Well, for starters, we assign an empty string to a variable named strValue:

strValue = ""

We then execute the following For Next loop:

For i = 1 to intLength
    If IsNumeric(Mid(strLine, i, 1)) Then
        strValue = strValue & Mid(strLine, i, 1)
    Else
        Exit For
    End If
Next

What we’re doing here is looping from 1 to the total number of characters in the line (intLength). Inside this loop, we use the IsNumeric function to determine
whether or not the first character in the line is a number. If it is, we append that character to the variable strValue:

strValue = strValue & Mid(strLine, i, 1)

In other words, the first time through the loop, strValue will be equal to 1. We then repeat the process with the next character in the line. If this, too is a
number (and it is) we append that value to the variable strValue. That makes strValue equal to this: 12. This continues until we find a non-numeric character.
As soon as we encounter a non-numeric character we call the Exit For statement and exit the For Next loop.

So what did that do for us? That enabled us to extract the number that appears at the beginning of each line. Now that we know that number we can use this
block of code to add a new record to our recordset, a record consisting of the entire text of the line (TextLine) and the number that appears at the beginning
of the line:

DataList.AddNew
DataList("TextLine") = strLine
DataList("NumberLine") = strValue
DataList.Update

In other words, our recordset now contains the following record:

TextLine

NumberLine

124This is a line in the text file.

124

And then it’s on to the next line in the text file.

After we’ve read all the lines from the text file (and added each line to our recordset) we close the file and then immediately reopen it, this time for writing. All
it takes now is one little line of code to sort the recordset:

DataList.Sort = "NumberLine ASC"

As you can see, we’re sorting by the field NumberLine; that’s the field that contains the number that appears at the beginning of each line. We’ve also chosen
to sort the data in ascending (1 to 100) order; that’s what the ASC is for. Would you rather sort the recordset in descending order (100 to 1)? Then use this
line of code instead:

DataList.Sort = "NumberLine DESC"

All we have to do now is replace the existing contents of our text file with these new, sorted contents. To do that, we use the MoveFirst method to move to
the first record in our recordset. We then use this block of code to loop through all the items in that recordset, using the WriteLine method to add the values
of the TextLine field to the file:

Do Until DataList.EOF
    objFile.WriteLine DataList.Fields.Item("TextLine") 
    DataList.MoveNext
Loop

Will that really work? Well, suppose we have the following text file:

14tttt
2aaaa
8ffff
6vvvv
4cccc
12xxxx
11zzzz
9qqqq

Here’s what the same file looks like after we run our script:

2aaaa
4cccc
6vvvv
8ffff
9qqqq
11zzzz
12xxxx
14tttt

Well, what do you know: it did work!

That should take care of your problem, KT. In a way, though, that’s a shame: like we said, that would have been a pretty good event in the 2009 Scripting
Games. After all, it’s not impossible to solve, but it’s also not immediately obvious what the solution might be. That, of course, is what the Scripting Games are all about.

Speaking of which, have we mentioned that the 2008 Winter Scripting Games begin February 15th? You don’t want to miss it; after all, we expect this
year’s Games to be the biggest and the best competition ever.

OK, admittedly, we’ve only had two prior editions of the Scripting Games, and the first one was tossed together in about a week. Nevertheless, this year’s
Games – which includes a Perl division as well as the VBScript and Windows PowerShell divisions – should be a lot of fun: the events are a bit more challenging
(but still doable); we have some “bonus” activities planned for those of you who run through all 10 events in a single day (unlike the Scripting Guys, who
typically take 10 days to run through a single event); and we have a lot of cool prizes to give away. And remember, we have a Beginners division as well as
an Advanced division. That means there’s a little something for everyone. Even you.

OK, OK, maybe not him, and maybe not her. But there’s definitely something for you.

In other words, we’d like as many people as possible to participate in the Scripting Games. Mark your calendar now. Or, even better, wait till tomorrow; in
tomorrow’s column, we’ll give you a script that can mark your calendar for you.

Hmmm …. You know, that might be a good Scripting Games event, too ….

At any rate, that should do it for today – oh, right. As promised, here’s a simpler version of the script, one that works only with values from 0 to 9:

Const ForReading = 1
Const ForWriting =2

Const adVarChar = 200
Const MaxCharacters = 255
Const adFldIsNullable = 32

Set DataList = CreateObject("ADOR.Recordset")
DataList.Fields.Append "TextLine", adVarChar, MaxCharacters, adFldIsNullable
DataList.Open

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading)

Do Until objFile.AtEndOfStream
    strLine = objFile.ReadLine
    DataList.AddNew
    DataList("TextLine") = strLine
    DataList.Update
Loop

objFile.Close

Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForWriting)

DataList.Sort = "TextLine ASC"

DataList.MoveFirst

Do Until DataList.EOF
    objFile.WriteLine DataList.Fields.Item("TextLine") 
    DataList.MoveNext
Loop

objFile.Close

And that should do it for today. See you all tomorrow.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • <p>Hey,</p> <p>I was using this script.</p> <p>and the file i am using to sort is more than 50 mb.</p> <p>However when i run this script, i get a out of memory error in the line :</p> <p>DataList.Sort = &quot;TextLine ASC&quot;</p> <p>Please help.</p>