Hey, Scripting Guy! How Can I Extract Specific Information From a Word Document and Then Use That Information to Rename the Document?

Hey, Scripting Guy! How Can I Extract Specific Information From a Word Document and Then Use That Information to Rename the Document?

  • Comments 1
  • Likes
Hey, Scripting Guy! Question

Hey, Scripting Guy! I have a bunch of Word documents in a folder. I need to open each document, read in some information from the first two lines, then use that information to rename the file. However, I’m having some trouble getting this to work. Can you help me?

-- TG

SpacerHey, Scripting Guy! AnswerScript Center

Hey, TG. As devoted readers of Hey, Scripting Guy! know, the Scripting Guy who writes this column is usually all business: he never lets anything distract him from his job of answering questions about system administration scripting. Today, however, we have to make an exception: the following story was just too weird to pass up.

Last week a 66-year-old man died in his sleep. His roommates – in what must have been some sort of tribute to their deceased comrade – decided to try and cash the man’s most-recent Social Security check. When they did, however, they were told that the person who’s name was on the check had to be present before the check could be cashed. The two mean responded by returning home, dressing the deceased the best they could, then plopping the body into a desk chair and wheeling him back to the check cashing store. They left the dead man on the sidewalk, then went back inside the store and tried to cash the check.

To make a long – and weird – story short, a crowd gathered, a policeman came by to see what the hubbub was all about, and the two men were arrested. The pair claimed they had no idea that their friend was dead. “He looked like that every morning,” said one man. “I didn’t know he was dead. He had $500 in his pocket. I had $200. Why would I rob the guy?”

In other words, their friend looked like that – dead – every morning. Hard to believe? Maybe. But, then again, none of you have ever seen the Scripting Guy who writes this column first thing in the morning.

Like we said, that was just too weird to pass up. But now it’s time to regain our focus. TG is looking for a script that can open a Microsoft Word document (actually a whole slew of Word documents), grab some information from the first two lines, then use that information to rename the file. How can he do that? Why, like this, of course:

Set objWord = CreateObject("Word.Application")
objWord.Visible = True

Set objDoc = objWord.Documents.Open("C:\Scripts\Test.doc")

strText = objDoc.Paragraphs(1).Range.Text
arrText = Split(strText, vbTab)
intIndex = Ubound(arrText)
strUserName = arrText(intIndex)

arrUserName = Split(strUserName, " ")
intLength = Len(arrUserName(1))
strName = Left(arrUserName(1), intlength - 1)

strUserName = strName & ", " & arrUserName(0)

strText = objDoc.Paragraphs(2).Range.Text
arrText = Split(strText, vbTab)
intIndex = Ubound(arrText)

strDate = arrText(intIndex)
strDate = Replace(strDate, "/", "")

intLength = Len(strDate)
strDate = Left(strDate, intlength - 1)

strFileName = "C:\Scripts\" &  strUserName & " " & strDate & ".doc"

objWord.Quit

Wscript.Sleep 5000

Set objFSO = CreateObject("Scripting.FileSystemObject")
objFSO.MoveFile "C:\Scripts\Test.doc", strFileName

To tell you the truth, this turned out to be a tiny bit more complicated than we initially thought; that’s due to the way paragraphs are formatted in Word. Of course, the fact that this script is a tiny bit complicated could also be due to the fact that – more often than not – the Scripting Guy who writes this column has no real idea what he’s doing, and today’s column was no exception. But everyone should be used to that by now.

The script starts out in simple enough fashion, creating an instance of the Word.Application object and then setting the Visible property to True; that gives us a running instance of Word that we can see on screen. We then use this line of code to open the document in question (C:\Scripts\Test.doc):

Set objDoc = objWord.Documents.Open("C:\Scripts\Test.doc")

As TG noted in his email, this document (and all the other Word documents in this folder) start out in the same fashion; the first two lines in each document look something like this:

Person Name        Ken Myer
EncounterDate      01/01/08

Is that good news? You bet it is. That means that we can grab the user name simply by reading in the Text property of the first paragraph in the document. In fact, that means we can grab the user name simply by executing the following line of code:

strText = objDoc.Paragraphs(1).Range.Text

That’s also good … sort of. The one problem here is that we don’t just have the user name, we have this value:

Person Name        Ken Myer

Talk about too much information, eh? On top of that, when we go to rename the file we want the user name to look like this:

Myer, Ken

So what does all that mean? That means that we still have some work to do.

According to TG, there are two tab characters separating Person Name and Ken Myer (the person’s name). With that in mind, we can use the Split function to turn this string value into an array, splitting the string on the Tab character (represented by the VBScript constant vbTab):

arrText = Split(strText, vbTab)

That’s going to give us an array that looks like this, with the asterisk representing the Tab character:

Person Name
*
*
Ken Myer

As you can see, the user name is the last item in the array. How can we actually retrieve that value? Well, for starters, we can use this line of code, and the UBound function, to determine the index number of the last item in the array:

intIndex = Ubound(arrText)

And then we can use this line of code to grab the user name and store it in a variable named strUserName:

strUserName = arrText(intIndex)

Now we’re getting somewhere. Next we have to figure out a way to reformat this name (that is, turn Ken Myer into Myer, Ken). To do that, we start by creating yet another array, this one splitting strUserName on the blank space between Ken and Myer:

arrUserName = Split(strUserName, " ")

That creates a new array, with item 0 equal to Ken and item 1 equal to Myer.

This, by the way, is where we ran into a problem. (Not as big a problem as the two guys trying to cash a dead man’s Social Security check, but a problem nonetheless.) Because the user’s last name comes at the end of a line it has an end-of-paragraph mark appended to it; that means that array item 1 is actually equal to this, with the asterisk representing the end-of-paragraph mark:

Myer*

If we hope to get output that’s actually readable we need to get rid of that last character. (Trust us on this one; we learned that the hard way.) One easy way to get rid of that character is to first use the Len function to determine the total number of characters in the string:

intLength = Len(arrUserName(1))

Once we’ve done that we can then use the Left function to grab all the characters in the string except the last one (the length of the string minus 1). That’s what we do here:

strName = Left(arrUserName(1), intlength - 1)

Now, at long last, we can reformat our name using this line of code, storing the value in the variable strUserName:

strUserName = strName & ", " & arrUserName(0)

That takes care of line 1 in the Word document. For line 2, we use a similar approach, this time grabbing the Text of the second paragraph in the file:

strText = objDoc.Paragraphs(2).Range.Text

After isolating the date portion of the string we then use this line of code to replace all the /’s in the date with, well, nothing:

strDate = Replace(strDate, "/", "")

Note.Why do we do that? That’s right: because you can’t use the / in a file name.

And neither can we.

We next we use these two lines of code to remove the end-of-paragraph mark from the date:

intLength = Len(strDate)
strDate = Left(strDate, intlength - 1)

Got all that? Good. Let’s take a moment to recap where we are. We now have a variable (strUserName) equal to this:

Myer, Ken

We also have a second variable (strDate) equal to this:

010108

Now that we have those two items, we can go ahead and construct a new file path using this line of code:

strFileName = "C:\Scripts\" &  strUserName & " " & strDate & ".doc"

That’s going to result in strFileName being equal to this:

C:\Scripts\Myer, Ken 010108.doc

Which is just exactly what we wanted it to be equal to.

At this point we no longer have any need for the Word document; consequently, we call the Quit method to terminate Microsoft Word:

objWord.Quit

And then we use this line of code to pause the script for 5 seconds:

Wscript.Sleep 5000

Why do we pause the script? Well, if we close Word and then immediately try to rename the file we’re likely to get an “Access Denied” error; that’s going to happen if Word hasn’t fully terminated, and thus still has a lock on the file. Pausing for 5 seconds gives Word enough time to close before we rename the file.

Speaking of which, these two lines of code rename the file:

Set objFSO = CreateObject("Scripting.FileSystemObject")
objFSO.MoveFile "C:\Scripts\Test.doc", strFileName

As you can see, this part of the script isn’t the least bit complicated. All we do here is create an instance of the Scripting.FileSystemObject object, then use the MoveFile method to rename the file.

Note. OK, maybe it’s not complicated, but it is a little confusing. For some reason, the FileSystemObject doesn’t have a Rename method; instead it requires you to “move” the file from one path to another. If those paths are in the same folder, however, that effectively causes the file to be renamed. (Give it a try and you’ll see what we mean.)

By the way, we should also note that this script is designed to run only on the local computer. Could it be redesigned to work against the files on a remote computer? Yes, but it can be a little tricky to use Word with files on a remote machine. But what the heck: if this is something you’d like us to write about, well, just let us know. We’ll see what we can do.

This is actually a very nice little script, except for one thing: what TG really wanted to do was be able to run this script against all the files in a particular folder. Hey, no problem. Now that you have a basic understanding of what the script does, and how it does it, we can show you the do-this-for-all-the-files-in-a-folder version, and without any additional explanation. Enjoy!

Oh, right; here’s the script:

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder("C:\Temp")

Set objWord = CreateObject("Word.Application")objWord.Visible = True

For Each objFile in objFolder.Files
    Set objDoc = objWord.Documents.Open(objFile.Path)

    strText = objDoc.Paragraphs(1).Range.Text
    arrText = Split(strText, vbTab)
    intIndex = Ubound(arrText)
    strUserName = arrText(intIndex)

    arrUserName = Split(strUserName, " ")
    intLength = Len(arrUserName(1))
    strName = Left(arrUserName(1), intlength - 1)

    strUserName = strName & ", " & arrUserName(0)

    strText = objDoc.Paragraphs(2).Range.Text
    arrText = Split(strText, vbTab)
    intIndex = Ubound(arrText)

    strDate = arrText(intIndex)
    strDate = Replace(strDate, "/", "")

    intLength = Len(strDate)
    strDate = Left(strDate, intlength - 1)

    strFileName = "C:\Temp\" &  strUserName & " " & strDate & ".doc"

    objDoc.Close
    Wscript.Sleep 2000

    Set objFSO = CreateObject("Scripting.FileSystemObject")
    objFSO.MoveFile objFile.Path, strFileName
Next

objWord.Quit

That should do it, TG. By the way, in the interest of full disclosure the Scripting Guys should confess that we once tried to pull a scam very similar to the one tried by the two would-be check cashers. When Scripting Guy Peter Costantini died last year the surviving Scripting Guys put him in a chair and wheeled him down to the Benefits office, hoping to collect his last paycheck and his unused vacation time. Unfortunately, we failed to take into account the fact that we were dealing with Microsoft: they took one look at Peter and immediately promoted him to Program Manager!

Note. OK, as it turns out, Peter didn’t really die last year; he’s as alive and well as ever. In fact, now that we think about it, the whole time we were wheeling him down to the Benefits office he kept insisting that he was still alive. Apparently we’ve just gotten so used to ignoring everything Peter says that we didn’t pay any attention to him. Sorry, Peter!

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • is there a simple way to do this on multiple files just using the first line of text to rename. No tabs, just a line of text maybe paragraph indent.