Hey, Scripting Guy! How Can I Use Windows PowerShell to Pick Out the Unique File Extensions Used in a Collection of Files?

Hey, Scripting Guy! How Can I Use Windows PowerShell to Pick Out the Unique File Extensions Used in a Collection of Files?

  • Comments 2
  • Likes
Hey, Scripting Guy! Question

Hey, Scripting Guy! I’m having a strange problem with Windows PowerShell, and I was wondering if you could help me solve it. I’m trying to get a list of the unique file extensions used by the files in a folder. I have a whole bunch of different file types in that folder, but every time I run my command it returns just one file extension. What am I doing wrong?
-- TH

SpacerHey, Scripting Guy! AnswerScript Center

Hey, TH. You know, this has been a difficult morning for the Scripting Guy who writes this column. To get from the Scripting House to the office, this Scripting Guy has to drive through town, and make his way through 25 or so traffic lights. (Why doesn’t he just take the freeway? That’s easy: because he lives in the Seattle area. To paraphrase Yogi Berra, “No one takes the freeway any more; it’s too crowded.”)

This morning, in what surely ranks as one of the greatest driving feats in recorded history, the Scripting Guy who writes this column made it through the first 22 traffic lights without stopping. Seventy-one years after his flight Charles Lindbergh is still celebrated for being the first person to fly an airplane nonstop across the Atlantic. Trust us, that’s nothing compared to being the first person to drive a car nonstop from Kirkland to Redmond. With just a few blocks left to go the Scripting Guy who writes this column was already trying to decide which baseball hat he would wear during the tickertape parade that would no doubt be held in his honor.

Note. In case you’re wondering, he decided to wear the same ratty old hat he’s worn every day for the last four years. As it turns out, it wasn’t that difficult of a decision to make after all.

Unfortunately, though, things were simply not to be: three traffic lights away from immortality, the light turned red and the Scripting Guy who writes this column was forced to stop. Ironically, the light that stopped his push towards fame and fortune just happened to be the first traffic light you encounter after reaching Microsoft.

Note. Is that some sort of metaphor for the life of the Scripting Guys, the fact that, just when everything seemed to be going great Microsoft up and puts a stop to it? To tell you the truth, we better not answer that question.

But in answer to your question, yes it is a metaphor for the life of the Scripting Guys.

As TH has discovered, stoplights aren’t limited to roads and highways; sometimes you encounter a red light in Windows PowerShell as well. TH has a command that looks like this:

Get-ChildItem C:\Scripts | Select-Object Extension | Sort-Object Extension | Get-Unique

This command kicks off by using the Get-ChildItem cmdlet to return a collection of all the files in the folder C:\Scripts. Because TH is interested in only file extensions, he pipes that collection to the Select-Object cmdlet, asking Select-Object to weed out everything except the Extension property. That collection is then piped to the Sort-Object cmdlet, which sorts all the file extensions. Finally, TH uses the Get-Unique cmdlet to echo back a single instance of each file extension found in C:\Scripts.

Make sense? In other words, suppose C:\Scripts has the following set of files:

Test1.doc
Test2.doc
Test3.mdb
Test4.ppt
Test5.ppt
Test6.ppt
Test7.xls
Test8.xls

This set of 8 files uses 4 different file extensions: .doc; .mdb; .ppt; and .xls. Because of that, TH assumes he’ll get back the following report:

.doc
.mdb
.ppt
.xls

Instead, he gets back this:

.doc

Obviously something went wrong. But what?

Note. First, however, let’s relate the only traffic light joke that the Scripting Guy who writes this column knows:

“What did the first stoplight say to the second stoplight?”

“Don’t look at me; I’m changing!”

And yes, it’s probably for the best that this is the only traffic light joke that he knows.

In order to explain the problem TH has run into it helps to know what the Get-Unique cmdlet does. Suppose we have an array like the following:

$a = 5,2,3,2,6,6,2,9,2,2,2,3,5,5,2,2,2,5,4,4,4,3,3,2,2,6,4,4,5,6,2,2,7,2,6,6,3

Now, suppose we need to know how many unique numerals are used in that array. How can we determine the unique numerals used in the array? That’s easy: we first sort the array, then pass the sorted list to Get-Unique:

$a | Sort-Object | Get-Unique

In turn, Get-Unique reports back the following:

2
3
4
5
6
7
9

Pretty cool, huh?

What’s that, you say? Isn’t that the exact same thing that TH did? Well, almost, but not quite; there’s one important difference. At the start of his command, TH used Get-ChildItem to return a collection of all the files found in the folder C:\Scripts; Get-ChildItem is going to bring back a collection of objects, with each object representing a file found in C:\Scripts. (By contrast, there are no objects in our array, just numeric values.) TH then passes those objects to Select-Object and Sort-Object. When that’s done he still has a collection of objects. Admittedly, there’s not a whole lot left of these objects; in fact, what he has are a bunch of file objects that contain a single property: Extension. Nevertheless, these are still objects.

And that’s why TH’s command seems to fail. (As it turns out, it’s actually doing what it’s supposed to.) TH handed Get-Unique a collection of objects, each object containing a single property. He then asked Get-Unique to report back the unique items in that collection. As it turns out, all the items are identical: they’re all file objects with a single property. Consequently, Get-Unique correctly reports back a single instance of the file object:

.doc

Note. Yes, we know: intuitively that doesn’t make much sense. But think about it for a bit; after awhile you’ll understand why it works that way.

So how do we solve TH’s problem? That’s easy; we just need to add Get-Unique’s –asString parameter:

Get-ChildItem C:\Scripts | Select-Object Extension | Sort-Object Extension | Get-Unique -asString

The –asString parameter tells Get-Unique to work with property values rather than objects; because of that, Get-Unique will look at the actual value of each property in the object rather than looking solely at the object type. In turn, that means that Get-Unique will correctly report back the unique file extensions:

.doc
.mdb
.ppt
.xls

Much better.

Incidentally, suppose you wanted to know which file extensions were truly unique; that is, which file extensions are used one time and one time only. As far as we know there’s no way to do that using Get-Unique. However, you can do that using a command similar to this one:

Get-ChildItem C:\Scripts | Group-Object Extension | Where-Object {$_.Count -eq 1} | Select-Object Name

To explain how this command works let’s go back to the array $a and use a similar command to retrieve the truly unique, one-of-kind items from $a:

$a | Group-Object | Where-Object {$_.Count -eq 1} | Select-Object Name

As you can see, here we take the variable $a and pipe it to the Group-Object cmdlet; in turn, Group-Object will categorize all the items in the array, like so:

Count Name                      Group
----- ----                      -----
    5 5                         {5, 5, 5, 5...}
   14 2                         {2, 2, 2, 2...}
    5 3                         {3, 3, 3, 3...}
    6 6                         {6, 6, 6, 6...}
    1 9                         {9}
    5 4                         {4, 4, 4, 4...}
    1 7                         {7}

The Count property indicates the number of items in each group; if a group has a Count of 1 that means there’s only one item in that group. In order to grab those particular groups we pipe the data to the Where-Object cmdlet and ask Where-Object to pick out those groups where the Count is equal to 1:

Where-Object {$_.Count -eq 1}

Finally, we use Select-Object to report back the value of the Name property:

Select-Object Name

That should result in the following information being echoed back to the screen:

Name
----
9
7

If you go back to line of code where we assigned values to the array you should find just one instance of the number 9 and one instance of the number 7.

That should take care of things, TH; let us know if it doesn’t. We should also note that while the Scripting Guy who writes this column would have been proud to be the first person to drive nonstop from Kirkland to Redmond, passing through 25 consecutive traffic lights would not have been a world’s record; heck, it wouldn’t have even been a family record.

Which must mean it’s time for another amusing family story. When the Scripting Guy who writes this column was a kid the Scripting Family took a trip to southern California, passing through San Francisco on the way.

Note. And no, the family did not travel by covered wagon; the Scripting Guy who writes this column isn’t that old. Instead, the family took one of those new-fangled horseless carriages.

As it turns out, the Scripting Dad was following a family friend through town, and this family friend had the disconcerting habit of driving through yellow lights. The Scripting Dad didn’t want to get lost, so he dutifully kept pace, something which caused him to run 28 consecutive red lights. That’s a record that will never be broken.

Well, at least not by anyone likely to live to tell the tale.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • I am looking to combine a bunch of scripts to clean up a set of drives I have that were file copy backups of a machine that was re-imaged. The process I am looking for is organize in folders by extention then dedupe.  I have the de-dupe down and I know how to do the org into folders by extention once I get the list of extentions.

    This approach is seemed perfect for getting the list of unique extentions that I need for a basis but once add " -recurse" to the command below I get a long list of dupe extensioins.

    Get-ChildItem F:\ - recurse | Select-Object Extension | Sort-Object Extension | Get-Unique -asString

    I know why but didn't know if thier was an easier way than a hash to do it...

    Thoughts??

  • I kinda like this way of doing it:

    Get-Childitem \\server\share\folder -Recurse | WHERE { -NOT $_.PSIsContainer } | Group Extension -NoElement | Sort Count -Desc

    It will return a table like this:

    Count Name

    ----- ----

       6 .doc

       4 .txt

       2 .zip

       2 .vsd

       2 .jnt

       2 .ppt

    Really nice way of figuring out what the users are storing on network shares and subfolders etc.