Learn about Windows PowerShell
Summary: Learn how to use a Windows PowerShell command to search easily for information in a collection of files.
Hey, Scripting Guy! I need to be able to parse multiple files for text that are in a single folder. I hate to have to write a script for such a common task, but I am afraid I will have to do so. Can you help me?
Microsoft Scripting Guy Ed Wilson here. The Scripting Wife and I are trying to get things sorted out this week before we leave for Corpus Christi, Texas, where I will be teaching a Windows PowerShell class. In addition, we will be appearing at the inaugural meeting of the Corpus Christi PowerShell User Group meeting. If you will be in South Texas on August 9, 2011, you should come check it out. It should be great fun. Oh, by the way, I am doing a meeting today with Lincoln SQL Server User Group (ssug). The meeting will be available via Live Meeting. It feels like this week started late and will end early. Luckily, SH, the answer to your question is no, you do not have to write a script to parse a folder full of files for a particular string. In fact, it was a topic that was tested in the Beginner Event 6 in the 2011 Scripting Games.
The solution is to use the Select-String cmdlet. One thing to keep in mind is that the Select-String cmdlet reads text files; it cannot read the more complicated file types such as .doc and .docx files that are generated by Microsoft Word. When I attempted to search a folder containing the Word documents and pictures that make up a typical Hey, Scripting Guy! Blog post, Windows PowerShell displayed a bunch of gibberish in the console, and then locked up. This is shown in the following figure.
The easy way to avoid producing gibberish is to specify the file types you want to search. For example, if I want to search all text files in the c:\fso directory for a pattern of ed (such as my first name), I include a wildcard character in my path specification, and choose any file that has the file extension of .txt. The nice thing about the Select-String cmdlet is that it expects the path as well as the pattern parameter to be strings, so I do not need to use quotation marks for either the pattern or the path. I can use the following command to search the c:\fso folder for files that have the .txt file extension, and contain a pattern match for ed:
Select-String -Path c:\fso\*.txt -pattern ed
The command and associated output are shown in the following figure.
If I use the Get-Command cmdlet (gcm is an alias for this cmdlet) to examine the syntax for the Select-String cmdlet, I see that both the path and the pattern parameters will accept an array of strings. This means that I can use the wildcard character trick with the file extensions to look for multiple files at the same time. To examine only the syntax of the Select-String cmdlet, I used the Get-Command cmdlet and piped the output to the Select-Object cmdlet (select is an alias). I then chose to expand the definition property. The resulting command is shown here:
gcm select-string | select -expand definition
Because I can supply an array of strings to the path parameter, I can search for both .log files and .txt files at the same time. In my revised Select-String command, I search the c:\fso folder for both .txt and .log files. I look inside both types of files for a pattern match of ed. The revised command is shown here:
Select-String -Path c:\fso\*.txt, c:\fso\*.log -pattern ed
Because the pattern parameter also accepts an array of strings, I can also search the .txt and the .log files for both ed and teresa strings. The command to search the c:\fso folder for both .txt and for .log files, and to look for pattern matches with both ed and teresa is shown in the following figure.
In addition to directly using the path parameter in the Select-String cmdlet, it may be easier to use the Get-Childitem cmdlet for more granular control over the files to be parsed. In the following command, I use the dir command (an alias for the Get-ChildItem cmdlet) and provide the path of c:\fso (the path does not appear in the command because it is the default parameter). I include only .txt and .log files (I use the –I and rely on parameter completion to specify the include parameter. I do the same thing with the recurse switch (in that I just use the letter r). I pipe the results to the Select-String cmdlet and look for the pattern fail (pattern is the default parameter and therefore is omitted in the command). The long version of the command is shown here:
Get-ChildItem -Path c:\fso -Include *.txt, *.log -Recurse | Select-String -Pattern fail
Here is an example of the shorter form of the command.
dir c:\fso -I *.txt, *.log -R | Select-String fail
The command and associated output are shown here.
Interestingly enough, the above output displays information from an install.log file, and it shows a bunch of failures. I decide that I would like to see the successes as well as the failures. I modify the command by adding the word success to the pattern. The revised command is shown here:
dir c:\fso -I *.txt, *.log -R | Select-String fail, success
As I look over the output from the previous command, I see a pattern appearing: on all the servers, the installation failed. On the client computers, the installation was a success. But I am missing my Windows XP computers in the output. I decide to add the word pending to my array of search terms. Here is the revised command:
dir c:\fso -I *.txt, *.log -R | Select-String fail, success, pending
Well, SH, thank you for your question. I hope I have encouraged you to spend a bit more time exploring the Select-String cmdlet.
I invite you to follow me on Twitter and to join the Scripting Guys on Facebook. If you have any questions, send email to me at firstname.lastname@example.org, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.
Ed Wilson, Microsoft Scripting Guy
one of the biggest gains we get with Powershell is the easy of supplying arrays or scalars interchangeably.
You might have a single item to work with or multiple items separated by a comma.
That's really an achievement!
How Can I use the Select-String for xlsx files?
@Mao - you can't they are compressed files. Look on the Internet for instructions on how t extract the XML from the XSLX file.
Here is a starter on how to extract the XML. Maybe Ed will blog this one day. I am not completely up-to-date on how all of this goes together.
I have just read the above and in particular "One thing to keep in mind is that the Select-String cmdlet reads text files" which is part of what I am looking for.
I have written a Powershell script to search certain text in the VBA coding. It works and is able to search within the .dot files giving me useful results but in the same directory there are .dotm files which does not seem to be searching.
G:\inpro\templates\word> get-childitem -recurse -force | select-string -pattern "agent_attention" "applicant_and_title" | select-object path
I cannot wait to find out why!
This is all great information but how do you apply it when remote into a server with security?
Your above commands/solution works well for text files, but will not work with docx documents for obvious reasons. How would you go about searching within docx files for a user supplied string? Your help with this would be much appreciated, thanks