How Can I Extract Text Embedded Between Target Phrases in a Text File?

How Can I Extract Text Embedded Between Target Phrases in a Text File?

  • Comments 1
  • Likes
Hey, Scripting Guy! Question

Hey, Scripting Guy! How can I read a text file until I encounter a specified phrase, then write that information to a new text file? I then need to repeat this process, reading the text file to the next instance of the phrase and then writing that information to another new file. And so on.

-- EW

SpacerHey, Scripting Guy! AnswerScript Center

Hey, EW. You know, this has been a difficult year for the Scripting Guys. Back in February we said goodbye to Scripting Guy Peter Costantini, who left the team to take a position as a Program Manager here at Microsoft. Just a week or two ago we bid adieu to Scripting Guy Dean Tsaltas who, while still working at Microsoft, moved back to Halifax, Nova Scotia with the rest of his family. And now today we have the sad duty of saying goodbye to Scripting Guy Jean Ross.

Although Jean was a member of the team for a relatively short amount of time, there’s no way that we can understate her contributions both to the team and to the world of system administration scripting. As far as the Scripting Guy who writes this column is concerned, Jean Ross is, without question, the – sorry; could we put you on hold for a second here? Thanks.

Say that again? What do you mean you aren’t leaving the team? You have to leave the team; we wrote this moving tribute to you. Besides, if you don’t quit then we won’t have a column for today; we can’t very well publish a goodbye to Jean Ross if you aren’t leaving. You know, you’re really putting us in a very awkward position here; where’s your sense of team spirit?!

Um, excuse us folks; we seem to be having some … technical … difficulties here. While we sort out these issues why don’t we show you a script of some kind? What does this script do? Well, we’re not sure; something about reading a text file up to a specified point and writing that data to a text file, then doing the same thing up until the next specified point. But you guys are smart; you’ll figure it out.

Again, our apologies for the delay. We’ll be back with you as soon as we can. In the meantime:

Const ForReading = 1

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading)

strContents = objFile.ReadAll

objFile.Close

arrContents = Split(strContents, "#This is the end of the file.#")

i = 1

For Each strItem in arrContents
    strFileName = "C:\Scripts\NewFile_" & i & ".txt"
    Set objNewFile = objFSO.CreateTextFile(strFileName)
    objNewFile.Write strItem
    objNewFile.Close
    i = i + 1
Next

Before we explain how this script works, let’s take a look at our sample text file, a file that looks something like this:

Here is some text in our document.#This is the end of the file.#Here is 
some more text in the document.#This is the end of the file.#Here is 
the last bit of text in the document. #This is the end of the file.#

Note. Yes, a matter of fact Jean Ross did write this; it’s a rough draft for next month’s Sesame Script column. You can see why we’re going to miss her so much, can’t you?

Or at least we would miss her, if she’d ever actually leave.

If you look at this file closely, you’ll see that the phrase #This is the end of the file.# appears in a few different places. As you can probably guess, this is the target phrase, the one that separates one bit of information from another. In turn, that means that our script needs to do this:

Read the text Here is some text in our document.

After encountering the target phrase, save the preceding text to a text file (discarding the target phrase in the process).

Read the text Here is some more text in the document.

After encountering the target phrase, save that text to a text file.

Read the text Here is the last bit of text in the document. and then save that to a third text file.

So how hard is that going to be? As it turns out, not very hard at all.

We start the script off by defining a constant named ForReading, assigning this constant the value 1; we’ll use ForReading when we set out to open the text file C:\Scripts\Test.txt. Speaking of which, we then use the following two lines of code to create an instance of the Scripting.FileSystemObject and open Test.txt for reading:

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading)

As soon as the file is open, we use the ReadAll method to read in the entire contents of that file, storing the data in a variable named strContents. Once those contents are safely stashed away in memory we call the Close method and close Test.txt. For good.

All that serves as a prelude to this line of code:

arrContents = Split(strContents, "#This is the end of the file.#")

As we noted, our goal here is fairly straightforward: we need to identify – and save – all the text that comes between instances of our target phrase. Although there are several different ways we might be able to tease out this text, we decided that the simplest approach was to take the Split function and split the text file into an array named arrContents. Where exactly do we split the file? You got it: on each and every instance of the target phrase. (Not everyone is aware of this, but the Split function doesn’t limit you to splitting a string on a single character, say, a comma or a carriage return-linefeed.)

So what do we gain by this? Well, after we call the Split function arrContents will contain the following items:

Here is some text in our document.

Here is some more text in the document.

Here is the last bit of text in the document.

And you’re absolutely right: those are the three separate bits of information we need to extract from the file. (That is, those items represent the text that falls between the target phrase.) The easiest way to get at that data was simply to call the Split function, something that discards the target phrase and stores the intervening text passages as items in an array.

The rest is easy. After assigning the value 1 to a counter variable named i, we set up a For Each loop that loops us through all the items in arrContents:

For Each strItem in arrContents

Inside that loop, we first use this line of code to construct a file path for our first text file:

strFileName = "C:\Scripts\NewFile_" & i & ".txt"

All we’re doing here is combining the string C:\Scripts\NewFile_ with the value of the counter variable and the string value .txt. That’s going to result in a file path like this:

C:\Scripts\NewFile_1.txt

Once we have the file path we can then call the CreateTextFile method to create a new text file:

Set objNewFile = objFSO.CreateTextFile(strFileName)

And what do you suppose we’re going to store in that text file? You got it: the value of the first item in the array arrContents. To get this data into the text file all we have to do is call the Write method, passing the variable strItem as the sole method parameter:

objNewFile.Write strItem

After that we close the file, increment the value of our counter variable by 1, and then repeat the process with the next item in the array. When all is said and done, our sample script will produce three separate text files, one for each item in the array:

C:\Scripts\NewFile_1.txt
C:\Scripts\NewFile_2.txt
C:\Scripts\NewFile_3.txt

And that, as they say, is that.

In the meantime we have good news for everyone: as it turns out, Scripting Guy Jean Ross isn’t leaving the team after all. (No, we tried paying her off; she took our money, then insisted on staying anyway.) And yes, that is good news for the scripting world; after all, if Jean was no longer a Scripting Guy then the scripting world would no longer reap the benefits of all the great things that Jean does.

Well, OK: all the great things that Jean undoubtedly will do.

Eventually.

Note. The odds are probably against this, but if there’s another set of Scripting Guys out there, and one of those Scripting Guys is named Jean Ross, and that Jean Ross is leaving the team, well, please let us know. We can make you one heck of a deal on a we’re-so-sad-to-see-Jean-go column.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Thanks for the script, but I have a problem. I am getting an empty .txt file at the end and an extra line at the top of the other created .txt files. Any suggestions?