Hey, Scripting Guy! How Can I Use Windows PowerShell to Replace Characters in a Text File?

Hey, Scripting Guy! How Can I Use Windows PowerShell to Replace Characters in a Text File?

  • Comments 33
  • Likes

Hey, Scripting Guy! Question

Hey, Scripting Guy! Using Windows PowerShell, how can I replace all the asterisks in a text file with some other character?

-- RC

SpacerHey, Scripting Guy! AnswerScript Center

Hey, RC. You know, a lot of people ask the Scripting Guy who writes this column, “How do you do it? How do you manage to write a new column each and every day?” (Of course, lots of other people ask him why he writes a new column each day. But that’s another story.) “Every single day,” they’ll marvel. “Don’t you ever get too sick or too tired to write Hey, Scripting Guy!?”

Believe it or not, the answer to that is no, the Scripting Guy who writes this column never gets too sick or too tired to write Hey, Scripting Guy!; in fact, the Scripting Guy who writes this column is probably the healthiest person in the entire world. Is that due to a rigorous program of diet and exercise? Well, no, not really, not unless you count watching TV as exercise, and not unless doughnuts are now considered part of a healthy diet. Instead, the Scripting Guy who writes this column is healthy for one reason and one reason only: he’s overworked and overstressed.

It’s true: work-related stress is supposedly good for you. Researchers in Europe recently discovered that people are more likely to get sick when they are on vacation than when they go to work every day. The researchers theorized that this is because the stressors in the workplace trigger the body’s defense mechanisms, making it easier for you to ward off sickness and infections. When you’re at home you relax; in turn, your body lets down its guard, and – wham! – before you know it you’ve caught a cold or the flu. To be truly healthy, you need those workplace stressors.

And that’s good news for the Scripting Guy who writes this column. After all, if workplace stressors make you healthy, well, he’ll probably live to be 190 years old. At the very least, he’ll be around – and working – for a long time to come.

Which probably comes as a huge thrill to his old friend the Scripting Editor.

Note. Just think, Scripting Editor: we’ll be a team for many more decades to come! That might not sound like much fun, but just imagine how healthy that should make you.

Considering the fact that we’ve spent the morning trying to figure out how to write Perl scripts (in preparation for the upcoming 2008 Winter Scripting Games), we’re feeling especially … healthy … today. With that in mind, let’s see if we can figure out how to use Windows PowerShell to replace characters in a text file. For example, suppose we have the following text file (C:\Scripts\Test.txt):

This is line 1.*
This is line 2*.
*This is line 3.
This is * line 4.

Let’s further suppose that (for some reason) we want to replace those pesky asterisks (*) with at signs (@). How can we do that? Well, this script should do the trick:

(Get-Content C:\Scripts\Test.txt) | 
Foreach-Object {$_ -replace "\*", "@"} | 
Set-Content C:\Scripts\Test.txt

As you can see, there really isn’t much to this script; in fact, if we had slightly-wider Web pages we would have put the whole thing on a single line. We start out by using the Get-Content cmdlet to read in the text from the file C:\Scripts\Test.txt; by default, the text gets read in as an array, with each item in the array representing a single line in the text file. Oh, and notice that we enclosed the Get-Content command in parentheses. Why did we do that? Because that way we can be sure that PowerShell will read in the entire contents of the file before it does anything else.

So what happens after PowerShell has read in the entire contents of the text file? Well, our next step is to pipe those contents to the Foreach-Object cmdlet. As we pointed out a moment ago, when PowerShell reads in the contents of a text file it automatically turns that information into an array. What the Foreach-Object cmdlet will do now is loop through each and every item in that array; in other words, it will loop through each and every line in the text file. And for each of those lines Foreach-Object will execute the following scriptblock:

{$_ -replace "\*", "@"}

As you probably know, in a Windows PowerShell pipeline the $_ represents the current object. In this case, the first time we go through the loop $_ will represent the first line in the text file; the second time we go through the loop $_ will represent the second line in the text file; and so on. For each of these lines (that is, for each of these string values) we’re going to use the Replace method to replace all the asterisks in the line with an @ sign. To do that we simply specify the target character (an asterisk), followed by the replacement text (the @ sign).

The only tricky part here is that the asterisk is a reserved character in Windows PowerShell; because of that we need to “escape” the character before we can perform a search-and-replace operation using that character. How hard is that? Not hard at all; we just need to preface the asterisk (or any other reserved character) with a \:

"\*"

And yes, that is important. If you leave out the slash mark you’ll get an error message like this each time you run through the loop:

Invalid regular expression pattern: *.
At C:\scripts\test.ps1:2 char:33
+ Foreach-Object {$_ -replace "*",  <<<< "@"} |

Of course, you must also keep in mind that the only reason we need to escape the asterisk is because the asterisk is a reserved character in regular expressions. If you want to search for something that isn’t a reserved character then make sure you leave the \ off:

{$_ -replace "a", "@"}

So, in other words, sometimes we need a \ and sometimes we don’t? How are we supposed to know when we need to escape a character and when we don’t? You guys are making my head hurt.

Listen, don’t worry about it; remember, stress is good for you. (Which means that we probably should have said, “Go ahead and worry.” After all worrying is pretty stressful.) You don’t have to remember which characters are reserved characters; we’ll tell you which characters are reserved characters. We can’t say for sure that the following list (taken from MSDN) represents a complete list of reserved characters, but it’s a good place to start:

  • $
  • ()
  • *
  • +
  • .
  • []
  • ?
  • \
  • /
  • ^
  • {}
  • |


OK, back to the script. After Foreach-Object finishes off its search-and-replace operation, our virtual text file is then handed off to the Set-Content cmdlet; in turn, Set-Content writes the modified data back to the file C:\scripts\Test.txt:

Set-Content C:\Scripts\Test.txt

And that’s it; at that point we’re done.

You know, you’re right: that did seem a little too easy, didn’t it? Well, let’s take a peek at Test.txt and see what happened, if anything:

This is line 1.@
This is line 2@.
@This is line 3.
This is @ line 4.

Well, what do you know: it really was that easy, wasn’t it?

We hope that answers your question, RC; if it doesn’t, please let us know. In the meantime, the Scripting Guy who writes this column is in a bit of a quandary. He was going to go home early today, which sounded like way more fun than working. The only problem is this: each moment away from work increases the chances that this Scripting Guy will get sick. In fact, if he truly wants to live forever (and he does), he should stay late and work overtime each night. And on weekends. And holidays. And ….

Hmmm …. Maybe those researchers should recheck their calculations. After all, it’s quite possible that putting in long hours and skipping vacations don’t make you live forever; they just make it seem like you’ve lived forever.

But don’t worry about that, either: we’ll just go ask Peter Costantini, the oldest living Scripting Guy. Peter should know; after all, he pretty much has lived forever.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Nevertheless is good example.

    Let’s say at the end of each line there are some white spaces.

    How we can make use of trim as these arrays contains reserved characters?

  • Do I have to create a new line for each character I'd like to replace?  For example, if I have < and > and " I'd like to replace, do I have to have three lines each with a different replace operation?

  • Nice blog !! One problem i get is, when i use select-string after get-content the lengthier lines are automatically cut in 2 lines (It automatically inserts a new line char after the width of the screen) . And do powershell or any dll available which is as effecient as sed, awk in unix ??

  • A very common problem with Windows explorer is reported on all Windows help forums. I've seen people complaining about problem with long path files deletion.

    Some common problems with Windows Explorer people talk about:

    - Doesn't allow to delete files with long path names

    - Pops up error while deleting unwanted files or unnecessary files

    - Errors like: Access denied, sharing violation, source in use etc.

    I've found a very simple but smart solution for all these above mentioned common problems. A very simple but powerful tool to overcome these problems. Everyone who plays with the files must have this tool. Its compatible with all versions of Microsoft Windows.

    deletelongfile.com

  • I like it!  Thanks.

    Could this technique be tweaked to work with large files?  I have a 200MB data file with dodgy characters (eg: commas and pipes) I need to strip about before I can reliable turn it into a csv for further processing.

    First time I tried your suggested basic bit of code above it eventually bombed out with an out-of-memory error.  I'm trying again now with a "-readcount 1000"...

    (I am a PowerScript absolute newbie, but keen it pick it up.)

    <time passes>

    ...so 15 minutes later it had got 59.86% of the way through the file, and stopped with:

    Set-Content : Exception of type 'System.OutOfMemoryException' was thrown.

    So it would seem it doesn't like files over about 100MB.  Can you suggest any alternatives or additional parameters that might  do the trick?

    Thanks

  • New to powershell... I need to remove the first 10 characters from every line within a text file (we'll call it chapter.txt). This is what I have:

    Chapter01=01

    Chapter02=02

    Chapter03=03

    ...

    ..

    Chapter32=32

    I need them to look like:

    01

    02

    03

    ...

    ..

    32

    Any way to do this with powershell? Or do I need to explore other methods like unix commands?

  • m1kehunter,

    $lines = @()

    $text = Get-Content -Path .\chapter.txt

    $text | %{ $lines += $_.SubString(10,$_.Length-10) }

    Set-Content -Path .\NewChapter.txt -Value $lines

    Jeremy

  • this is a test. let's see if it works!

  • The regex class has a static method called Escape.  To get the escaped version of a string (assuming you want to escape characters indiscriminately) use [regex]::Escape($stringToBeEscaped).  Of course, eventually you'll want to use the full power of regular expressions, you'll learn the syntax and you'll know what needs escaping without thinking too hard about it.  Even then, [regex]::Escape can come in handy from time to time.

  • Very good explain,..

    Could you explain about all regular expressions here.. thanks

  • Very good explain..

    could you provide the example to replace the string from all the text file under the folder.. thanks

  • Putting this to good use.

  • please please please please be specific. We don't care about anything BUT powershell help. We don't wanna hear about all the other crap you write. I did a powershell project about a year ago and remember going through your blogs and sifting out important information through your detailed article where you wrote about everything and powershell. I'm sorry, I don't think you're interesting or funny, I'm being honest. If I wanted to read a fun upbeat article I'd go to facebook. So coming to the point............ as you also should like RIGHT AWAY...... I'm getting this error:

    "The process cannot access the file 'filename' because it is being used by another process".

    What should I do???

  • @kurninja - you are a ninja.  Roll with it.

    Ha! Ha!  You need to learn some WIndows.  The error is absolutel;y clear.  You cannot use a file that is in use.  It is just like a ninja cannot cut the air with a dull knife.  A sharp knife and a less than dull attitude will go far towards solving your existential dillema.

    None of what i have written is intended to be funny.  It is offered to the less than humble student in the most serious of mind.  It is the sound of one mind clapping.

    Heh!

    ;)

  • I did this for a *.doc file; and after doing so, LibreOffice can't open the *.doc file claiming it is damaged.

    Thanks.