How Can I Extract Information Located Between Two Known Values in a Text File?

How Can I Extract Information Located Between Two Known Values in a Text File?

  • Comments 4
  • Likes
Hey, Scripting Guy! Question

Hey, Scripting Guy! In a text file, how can I extract the information that appears between the string HOSTNAME= and a blank space?

-- B

SpacerHey, Scripting Guy! AnswerScript Center

Hey, B. You’ll have to pardon us if we seem a little sluggish today; we’re still recovering from all the fun and excitement of Windows PowerShell Week. What if you missed out on Windows PowerShell Week? Well, then you really blew it, didn’t you? Now you’ll never get to learn about Windows PowerShell, you’ll lose your job, your dog will bite you, and your spouse will pack up and leave.

OK, so maybe there is at least one advantage to having missed Windows PowerShell Week, if you know what we mean.

Just a second, B .…

Sorry; apparently the Scripting Editor has discovered a typo in the first paragraph of today’s column. Instead of, “Well, then you really blew it, didn’t you?” that sentence should read like this: “That’s OK; after all, all the Windows PowerShell Week resources – including links to the on-demand webcasts, the question-and-answer logs, and other related goodies – are available on the Windows PowerShell Week home page.” So we guess you’re not out of luck after all.

And yes, that does mean that you have to keep your job and spouse, at least for the moment. On the other hand, your dog probably won’t bite you. And you can still have the opportunity to learn all about Windows PowerShell.

Of course, one thing that you won’t find on the Windows PowerShell Week home page is a script that can extract the text that appears between the string HOSTNAME= and a blank space in a line in a text file. But don’t fret. Fortunately, we just happened to have a script lying around that can extract this kind of information from a string variable. We’ll show you how this generic script works, then modify it slightly so that it can grab this same information from a text file.

Here’s the script:

strSearchString = "ABCDEFGHIJK, NDPSGW PORT=LPR HOSTNAME=R2333_HP_1100 ABCDEFGHIJK"

intStart = InStr(strSearchString, "HOSTNAME=")
intStart = intStart + 9

strText = Mid(strSearchString, intStart, 250)

For i = 1 to Len(strText)
    If Mid(strText, i, 1) = " " Then
        Exit For
    Else
        strData = strData & Mid(strText, i, 1)
    End If
Next

Wscript.Echo strData

As you noted in your email, B, your text file includes lines that look something like this:

ABCDEFGHIJK, NDPSGW PORT=LPR HOSTNAME=R2333_HP_1100 ABCDEFGHIJK

You need to take these lines of code and find the string HOSTNAME=. Once you find this string you then want to grab all the text that appears after the equals sign, at least until you encounter a blank space. In this example, that means pulling out the text R2333_HP_1100. Let’s see if we can figure out how to do that.

Note. Before we go much further we should probably point out that there are several different ways that we could tackle this chore; for example, we could have used regular expressions to locate the value in question. The approach we chose isn’t necessarily the most sophisticated way of solving the problem, but we thought it was the easiest and most straightforward method; in addition, it was a method we thought the typical system administration scripter would be comfortable with. As usual, we’re less concerned with elegance than we are with simply finding a way to get the job done.

As you can see, we start out by assigning the sample line of text to a variable named strSearchString. We then use the InStr function in the following line of code to determine the starting position for the string HOSTNAME=:

intStart = InStr(strSearchString, "HOSTNAME=")

In this case, the InStr function is going to return the value 30. Why? Because the string HOSTNAME= can be found at character position 30 in the string strSearchString (if you count the characters you’ll see for yourself). Of course, knowing the starting position for HOSTNAME= is useful, but it doesn’t quite solve our problem: we need to know where the string ends. Why? Because that’s where we begin collecting data. Fortunately, that’s an easy obstacle to overcome: HOSTNAME= has 9 characters in it, meaning that all we have to do is add 9 to the variable intStart and we’ll know exactly where the target data begins. That’s what we do with this line of code:

intStart = intStart + 9

Like we said, now we know exactly where the data starts: in this case, that’s character position 39 (30 + 9). (Admittedly, it might seem like the math is off by 1. But, again, count the number of characters and you’ll see that this is OK.) Among other things, that tells us that we can discard the first 38 characters in the string; there’s nothing in there of interest to us. With that in mind we use the following line of code to create a new string (stored in the variable strText) that starts at character 39 and encompasses the next 250 characters:

strText = Mid(strSearchString, intStart, 250)

Note. Why 250? Actually, that’s an arbitrary number: we just wanted to make sure that we grabbed all the remaining characters in the string. And yes, we could have calculated the length of the string and determined exactly how many characters to grab, but we’re assuming you won’t have any line lengths larger than 250 characters. If you do, then you’ll need to modify this line of code accordingly.

And don’t worry: if your string contains less than 250 characters this line of code won’t generate an error. Instead, VBScript will simply take as many characters as do exist and call it good.

Next we set up a For Next loop that begins at 1 (representing the first character in the variable strText) and continues until we reach the end of the string (which we determine using the Len function):

For i = 1 to Len(strText)

What are we going to do inside this loop? Well, for starters, we’re going to check the very first character and see if it happens to be a blank space (remember, when we find a blank space we’ve reached the end of our data). If character 1 is a blank space we call the Exit For command and exit the loop. If character 1 is not a blank space then we use the Mid function and this line of code to add the character to a variable named strData (in the parameters passed to Mid the variable i represents the letter in the string and the 1 tells Mid to grab only that one character):

strData = strData & Mid(strText, i, 1)

What does that mean? In this example, character 1 is the letter R. The letter R is not a blank space, so the value R gets added to strData. We then loop around and check the second character in the string: 2. Again, this is not a blank space, so we append this character to strData, making strData equal to R2. We then loop around and check the third character in the string, continuing until we either run out of characters or until we encounter a blank space.

After we exit the loop we then echo back the value of strData which, with any luck, should be this:

R2333_HP_1100

And there you have it.

Now, how would you go about doing this in a text file? Why, like this, of course:

Const ForReading = 1

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt")

Do Until objFile.AtEndOfStream
    strData = ""
    strSearchString = objFile.ReadLine

    intStart = InStr(strSearchString, "HOSTNAME=")

    If intStart <> 0 Then
        intStart = intStart + 9
        strText = Mid(strSearchString, intStart, 250)

        For i = 1 to Len(strText)
            If Mid(strText, i, 1) = " " Then
                Exit For
            Else
                strData = strData & Mid(strText, i, 1)
            End If
        Next
    End If
    Wscript.Echo strData
Loop

We won’t bother explaining the whys and wherefores of reading text files; we have plenty of Hey, Scripting Guy! columns that can help you with that. But now that you have the basic idea about how to locate the target information you shouldn’t have too much trouble figuring out how the text file version of this script works.

We hope that helps, B. And, again, if any of you missed Windows PowerShell Week don’t worry: you can still revel in every minute of the week. Is it true that, during the day 5 webcast, Peter Costantini sang not one but two Windows PowerShell songs? Let’s put it this way: any time you hear a story about Peter, no matter how crazy, you should assume that it’s true.

But why not check out the day 5 webcast and see for yourself?

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Trying to run the script at the bottom of this post that reads a file  I get an error message - what am I doing wrong?

    Error Message is as follows:

    Missing statement body in do loop.

    At 6 char:4

    + Do  <<<< Until objFile.AtEndOfStream

       + CategoryInfo          : ParserError: (do:String) [], ParseException

       + FullyQualifiedErrorId : MissingLoopStatement

    Contents of script:

    Const ForReading = 1

    Set objFSO = CreateObject("Scripting.FileSystemObject")

    Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt")

    Do Until objFile.AtEndOfStream

       strData = ""

       strSearchString = objFile.ReadLine

       intStart = InStr(strSearchString, "HOSTNAME=")

       If intStart <> 0 Then

           intStart = intStart + 9

           strText = Mid(strSearchString, intStart, 250)

           For i = 1 to Len(strText)

               If Mid(strText, i, 1) = " " Then

                   Exit For

               Else

                   strData = strData & Mid(strText, i, 1)

               End If

           Next

       End If

       Wscript.Echo strData

    Loop

  • what script is this? .sh or?

  • This is a .vbs script as it is VBScript.

  • do we have a Unix shell script to check in multiple text files and get a list in an output. i need to capture specific Text from a group of files and check if any duplicates.. response highly appreciated. -Thanks