Remove Unwanted Quotation Marks from CSV Files by Using PowerShell

Remove Unwanted Quotation Marks from CSV Files by Using PowerShell

  • Comments 11
  • Likes

Summary: Learn how to remove unwanted quotation marks from a CSV file by using Windows PowerShell.

 

Microsoft Scripting Guy Ed Wilson here. The fall is rapidly falling down here in Charlotte, North Carolina, in the United States. It is cooling off here, and is around 60 degrees Fahrenheit (15.5 degrees Celsius, according to my conversion module). The neighbors have all been wracked with anguish over whether to tough it out for a few more days, or to go ahead and turn on the heater. It is, evidently, such a convoluted decision that they have made numerous Facebook postings about it. But I guess I would rather read about anguished decisions about to turn on the heater or not, than to read about the latest nonferrous material that their pet consumed. 

The Internet is cool. For example, I really enjoy comments that are posted on the Hey, Scripting Guy! Blog. In fact, I have subscribed to a RSS feed that alerts me any time that a new comment is made, regardless of how old the article may be. Just this morning, I replied to two comments from readers on postings that were more than five years old; that’s 35 in doggy years, and more like 50 in Internet years.

Anyway, the other day, I wrote a pretty cool article named, Use PowerShell to Append CSV files Easily.In the comments about that article, Jeffrey S. Patton from Kansas (yes Toto, he is from Kansas) posted a asked why I used Notepad to remove commas instead of programmatically replacing the commas in the output with Windows PowerShell. I answered that I did it because opening the file in Notepad, clicking Edit, and then clicking Replace was faster and easier than writing the extra code to replace the quotation marks. In a one-off scenario, that is probably true. But what if this is something that needs to be scheduled on a regular basis? If it is an automation scenario, opening the file in Notepad is off the table; as cool as Notepad is, it does not have an automation interface, so it is not a scriptable solution.

What exactly is the problem? When using the Export-CSV cmdlet, it automatically adds quotation marks around everything. In some cases, when reading this file, the quotation marks could be interpreted as part of the data, which of course would be a problem. It is rare, but it most certainly could happen. In fact, this very problem is one reason I quit using Microsoft Excel to manipulate CSV files: it has the same “feature.” This happens because if you have a space in a column, some programs are not smart enough to respect the space between commas. Therefore, to help prevent an error with a column that has a space in it, quotation marks are added everywhere. The output file produced from our command from yesterday (shown here) is shown in the figure following this code:

dir c:\fso -Filter *.csv | ? {$_.basename -like 'users?'} | Import-Csv | sort lname,fname | Export-Csv -Path c:\fso\UsersConsolidated.csv –NoTypeInformation

Image of output file with quotation marks

Jeffrey was absolutely correct when he said I could use replace to remove the quotation marks from the output file. It is actually pretty simple.

Here are the steps I need to perform:

  1. Use the Get-Content cmdlet to read the text of the usersconsolidated.csv file.
  2. Use the Foreach-Object cmdlet (% is an alias) to read each line as it comes from the file.
  3. Inside the script block for the Foreach-Object command, use the $_ automatic variable to reference the current line and the replace operator to replace a quotation mark with nothing.
  4. Use the Out-File cmdlet to overwrite the existing usersconsolidated.csv file with the newly changed content that no longer contains the quotation marks.
  5. Use the force switched parameter to tell the Out-File cmdlet to overwrite existing content.
  6. Use the encoding parameter to specify ASCII encoding to maintain compatibility with legacy applications.

The command line I use is much simpler than the six steps above would make it seem. Here is the command:

(Get-Content C:\fso\UsersConsolidated.csv) | % {$_ -replace '"', ""} | out-file -FilePath C:\fso\UsersConsolidated.csv -Force -Encoding ascii

The preceding command is a single-line command that has wrapped. No line continuation marks are used in the command. The parentheses around the Get-Content cmdlet are required (otherwise, it creates a blank file). I can actually shorten this command by searching for additional aliases. I use the Get-Alias cmdlet to search for aliases. The cool thing is that it will accept an array for input, and therefore I can look for aliases for both Get-Content and Out-File at the same time. Here is the command:

Get-Alias -Definition get-content, out-file

The bad thing about the Get-Alias cmdlet is that it returns an error when no match appears. This behavior is shown in the following figure.

Image showing error when no match appears

The shortened version of the command uses the gc alias for the Get-Content cmdlet. It also uses positional parameters and partial parameter names.

(gc C:\fso\UsersConsolidated.csv) | % {$_ -replace '"', ""} | out-file C:\fso\UsersConsolidated.csv -Fo -En ascii

As shown in the following figure, there are no more quotation marks in the file.

Image of output file with no quotation marks

If you want to do this all in a single command, it is easier to switch to using the ConvertTo-CSV cmdlet instead of the Export-CSV cmdlet. The reason is that ConvertTo-CSV converts to CSV format, but does not export to a file. This allows time to replace the quotation marks prior to writing to file. The revised command is shown here:

dir c:\fso -Filter *.csv | ? {$_.basename -like 'users?'} | Import-Csv |  sort lname,fname | convertto-csv -NoTypeInformation | % { $_ -replace '"', ""} | out-file c:\fso\usersconsolidated.csv -fo -en ascii

There is no alias for the ConvertTo-CSV cmdlet, but there is an alias for Import-CSV, which is ipcsv. I can also use ls as an alias for Get-ChildItem, instead of the lengthier dir alias. I can also shorten filter to fi (I cannot use the single letter f because there is also a parameter force defined for the Get-ChildItem cmdlet). Here is the shortened command:

ls c:\fso -Fi *.csv | ? {$_.basename -like 'users?'} | ipcsv | sort lname,fname | convertto-csv -NoTypeInformation | % { $_ -replace '"', ""} | out-file c:\fso\usersconsolidated.csv -fo -en ascii

Well, that is about all there is to removing quotation marks from a CSV file. Join me tomorrow for more Windows PowerShell fun.

 

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy

 

 

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • :-D Indeed I am from Kansas! But Dorothy's house is quite a bit south from where I am.

  • Hi Ed,

    the solution works, of course!

    But if we are nit-picky ... it does probably more than wanted!

    Quotation marks inside the string ( it maybe unusual, but ... ) are replaced too!

    If you have a value like a password e.g. ( which shouldn't be clear text ... .-) it may well look like:

    Kl@us!"§456

    and we may better want to preserve the " in it!

    So a replacement string like

    -replace '^"(.*)"$', '$1'

    which deletes only quotation marks in the beginning and end of the string might provide a better solution in this case ... if you don't want to keep your admins busy resetting passwords and unlocking accounts ...

    Klaus.

  • A legal CSV escapes quotes in strings with a double quote.  

    Quotes are never an issue with a legal CSV.  Some custom CSV building software, mostly scripts, fail to escape the quotes.  Quotes are required for any field that may have a comma.

  • @Klaus Schulte you are absolutely correct that the replace operator will indeed accept a regular expression pattern. It certainly makes sense to use regex if you have a situation where you might have quotation marks inside the quotation marks.

    @JV this also makes sense. The quotation marks are not a problem for the way Windows PowerShell or Microsoft Excel treat CSV files. This scenario IS one however that I have had several emails about ... so it is a problem for some people in certain cases ... one of those cases may be very well be scripts.

  • If you open a .CSV in excel it will automaticly remove the "". Like "name2","lol","test", will get name2,lol,test,

    the problem ist that you cant use them now as an variable in powershell because there are no quotes.

    Now can i add the quotes after editing the file in excel?

  • This kind of fix should never be necessary.

  • This fix was very necessary for us, exporting a csv from powershell and then importing into SQL would leave quotes on all the data fields.  You could remove the quotes by using a " as a delimiter but the column name would still have the quotes thus making it very difficult to import.  I found thse replacement strings very valuable.  Thank you.

  • THX

    (gc C:\fso\UsersConsolidated.csv) | % {$_ -replace '"', ""} | out-file C:\fso\UsersConsolidated.csv -Fo -En ascii

    Works well.

  • I agree that this kind of fix should never be necessary, but it is. Constantly.

    I support a database application that imports text files we receive from clients. These clients are large businesses with actual IT departments and staff, not just mom and pop shops, and the csv files we get are almost universally terrible.

    I have to have an hour long phone call each week with almost every new client we get for at least the first few weeks where I have to explain to people that have supposedly been doing IT for ten and twenty years, exactly what is and is not a valid csv file, and why their latest attempt at it is wrong.

    Powershell and the robustness and ease with which it deals with csv's is going to give me large portions of my life back.

    I'm excited!

  • Hi, I have a test.csv file, the conten this fil is:

    "AAA0010","A145701"

    "PAB0010","A153599"

    "VAA0010","A100074"

    I need this files content to be like this:

    AAA0010: A145701

    PAB0010: A153599

    VAA0010: A100074

    Move this this file from c:\test.csv to c:\test2\test.xt

    In the end save this as a txt file in a folder.

    Can you please help me, I have tried:

    PS C:\> (gc C:\test.csv) | % {$_ -replace '"', ""} | out-file C:\test2\test.txt -Fo -En ascii

    This script makes a txt file with content:

    AAA0010,A145701

    PAB0010,A153599

    VAA0010,A100074

    How can I replace , with : and a space

    Regards

    Kraco

  • Ed, you're a savior!
    I've been battling all day to find a way to add a column to a CSV file.
    I'm using highcharts to display latency data from our storage system. Highcharts expects the data to be row oriented i.e. series name in the first column and series data in the remaining columns.
    Powershell operates on data in a row oriented fashion so I first had to add a new data point to each record in my array; painful, but not impossible. Although I've had to change my column headings from a numeric representation of the date to alpha numeric as Powershell doesn't seem to like object property names to start with a number (is that true?).
    Having got this far I was then stumped when my webpage displayed nothing! But a quick look at the CSV file highlighted the culprit where is were this post came to the rescue.
    This post isn't the first I've read today to help me solve this little challenge so thanks for this and all the others!
    Mark