Learn about Windows PowerShell
Summary: Learn four easy ways to use Windows PowerShell to import CSV files into SQL Server.
Microsoft Scripting Guy, Ed Wilson, is here. I was chatting this week with Microsoft PowerShell MVP, Chad Miller, about the series of blogs I recently wrote about using CSV files. He thought a helpful addition to the posts would be to talk about importing CSV files into a SQL Server. I most heartily agreed. Welcome to Guest Blogger Week. We will start off the week with a bang-up article by Chad Miller. Chad has previously written guest blogs for the Hey, Scripting Guy! Blog. Here is a little information about Chad:
Chad Miller is a SQL Server database admin and the senior manager of database administration at Raymond James Financial. In his spare time, he is the project coordinator and developer of the CodePlex project SQL Server PowerShell Extensions (SQLPSX). Chad leads the Tampa Windows PowerShell User Group, and he is a frequent speaker at SQL Saturdays and Code Camps.
Contact information: Blog: Sev17 Twitter: cmille19
Windows PowerShell has built in support for creating CSV files by using the Export-CSV cmdlet. However, the creation of a CSV file is usually only a short stop in an overall process that includes loading the file into another system. In this post, we'll look at a few scripted-based approaches to import CSV data into SQL Server. Note: SQL Server includes a component specifically for data migration called SQL Server Integration Services (SSIS), which is beyond the scope of this article.
The T-SQL BULK INSERT command is of the easiest ways to import CSV files into SQL Server. The BULK INSERT command requires a few arguments to describe the layout of the CSV file and the location of file. Let's look at an example of creating a CSV file by using Export-CSV, and then importing the information into a SQL Server table by using BULK INSERT.
Requirements
Setup
1. Download the following script: Invoke-SqlCmd2.ps1
2. Create a table disk space by copying the following code in SQL Server Management Studio.
Note: The example uses a database named "hsg."
CREATE TABLE dbo.diskspace(
UsageDate datetime,
SystemName varchar(50),
Label varchar(50),
VolumeName varchar(50),
Size decimal(6,2),
Free decimal(6,2),
PrecentFree decimal(5,2)
)
The following image shows the command in SQL Server Management Studio.
3. Save the following script as Get-DiskSpaceUsage.ps1, which will be used as the demonstration script later in this post.
param($ComputerName=".")
Get-WmiObject -computername "$computername" Win32_Volume -filter "DriveType=3" | foreach {
new-object PSObject -property @{
UsageDate = $((Get-Date).ToString("yyyy-MM-dd"))
SystemName = $_.SystemName
Label = $_.Label
VolumeName = $_.Name
Size = $([math]::round(($_.Capacity/1GB),2))
Free = $([math]::round(($_.FreeSpace/1GB),2))
PercentFree = $([math]::round((([float]$_.FreeSpace/[float]$_.Capacity) * 100),2))
}
} | Select UsageDate, SystemName, Label, VolumeName, Size, Free, PercentFree
Now we will use the script Get-DiskSpaceUsage.ps1 that I presented earlier. It lists information about disk space, and it stores the information in a CSV file.
./get-diskusage.ps1 | export-csv -Path "C:\Users\Public\diskspace.csv" -NoTypeInformation
The generated CSV file shows that Export-CSV includes a text delimiter of double quotes around each field:
"UsageDate","SystemName","Label","VolumeName","Size","Free","PercentFree"
"2011-11-20","WIN7BOOT","RUNCORE SSD","D:\","59.62","31.56","52.93"
"2011-11-20","WIN7BOOT","DATA","E:\","297.99","34.88","11.7"
"2011-11-20","WIN7BOOT",,"C:\","48","6.32","13.17"
"2011-11-20","WIN7BOOT","HP_TOOLS","F:\","0.1","0.09","96.55"
Although many programs handle CSV files with text delimiters (including SSIS, Excel, and Access), BULK INSERT does not. To use BULK INSERT without a lot of work, we'll need to remove the double quotes. We can use a quick and dirty way of simply replacing all the quotes in the CSV file. In the blog post Remove Unwanted Quotation Marks from CSV Files by Using PowerShell, the Scripting Guys explains how to remove double quotes. This method can be used for circumstances where you know it won't cause problems. How do you know? Well, the data being generated from our Get-DiskspaceUsage should never have double quotes or commas in the data. So here's the code to remove the double quotes:
(Get-Content C:\Users\Public\diskspace.csv) | foreach {$_ -replace '"'} | Set-Content C:\Users\Public\diskspace.csv
UsageDate,SystemName,Label,VolumeName,Size,Free,PercentFree
2011-11-20,WIN7BOOT,RUNCORE SSD,D:\,59.62,31.56,52.93
2011-11-20,WIN7BOOT,DATA,E:\,297.99,34.88,11.7
2011-11-20,WIN7BOOT,,C:\,48,6.32,13.17
2011-11-20,WIN7BOOT,HP_TOOLS,F:\,0.1,0.09,96.55
Now we are ready to import the CSV file as follows:
. .\Invoke-SqlCmd2.ps1
$query = @"
BULK INSERT hsg.dbo.diskspace FROM 'C:\Users\Public\diskspace.csv'
WITH (FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '\n')
"@
Invoke-SqlCmd2 -ServerInstance "$env:computername\sql1" -Database hsg -Query $query
The following data shows that our CSV file was successfully imported.
UsageDate
System Name
Label
Volume Name
Size
Free
Percent Free
11/20/2011 12:00:00 AM
WIN7BOOT
RUNCORE SSD
D:\
59.62
31.56
52.93
DATA
E:\
297.99
34.88
11.70
C:\
48.00
6.32
13.17
HP_TOOLS
F:\
0.10
0.09
96.55
BULK INSERT works reasonably well, and it is very simple. However, there are some drawbacks, including:
For these reasons, let's look at some alternate approaches.
LogParser is a command-line tool and scripting component that was originally released by Microsoft in the IIS 6.0 Resource Kit. LogParser provides query access to different text-based files and output capability to various data sources including SQL Server. Even though this little tool hasn't been updated since 2005, it has some nice features for loading CSV files into SQL Server.
Download and install LogParser 2.2.
LogParser can do a few things that we couldn't easily do by using BULK INSERT, including:
Note: CSV files do not need to be local.
Using LogParser
You can use the LogParser command-line tool or a COM-based scripting interface. Let's look at examples of both.
LogParser command-line tool
LogParser requires some special handling, which is why we use Start-Process. Some switches and arguments are difficult to work with when running directly in Windows PowerShell. Also Windows Powershell_ISE will not display output from LogParser that are run via the command-line tool. Here is the syntax for running a command to generate and load a CSV file:
./get-diskspaceusage.ps1 | export-csv -Path "C:\Users\Public\diskspace.csv" -NoTypeInformation -Force
#Uncomment/comment set-alias for x86 vs. x64 system
#set-alias logparser "C:\Program Files\Log Parser 2.2\LogParser.exe"
set-alias logparser "C:\Program Files (x86)\Log Parser 2.2\LogParser.exe"
start-process -NoNewWindow -FilePath logparser -ArgumentList @"
"SELECT * INTO diskspaceLP FROM C:\Users\Public\diskspace.csv" -i:CSV -o:SQL -server:"Win7boot\sql1" -database:hsg -driver:"SQL Server" -createTable:ON
Looking at SQL Server, we see that our newly created table contains the CSV file:
Filename
Row Number
Usage Date
C:\Users\Public\diskspace.csv
2
2011-11-20
3
11.7
4
48
13.16
5
0.1
The CreateTable switch will create the table if it does not exist; and if it does exist, it will simply append the rows to the existing table. Also notice that we got two new columns: Filename and Row Number, which could come in handy if we are loading a lot of CSV files. You can eliminate the Filename and Row Number columns by specifying the column list in the Select statement as we'll see in a moment.
LogParser COM scripting
Using the COM-based approach to LogParser is an alternative method to using the command line. Although the COM-based approach is a little more verbose, you don't have to worry about wrapping the execution in the Start-Process cmdlet. The COM-based approach also handles the issue with Windows Powershell ISE. Here is code to work with the COM object:
$logQuery = new-object -ComObject "MSUtil.LogQuery"
$inputFormat = new-object -comobject "MSUtil.LogQuery.CSVInputFormat"
$outputFormat = new-object -comobject "MSUtil.LogQuery.SQLOutputFormat"
$outputFormat.server = "Win7boot\sql1"
$outputFormat.database = "hsg"
$outputFormat.driver = "SQL Server"
$outputFormat.createTable = $true
$query = "SELECT UsageDate, SystemName, Label, VolumeName, Size, Free, PercentFree INTO diskspaceLPCOM FROM C:\Users\Public\diskspace.csv"
$null = $logQuery.ExecuteBatch($query,$inputFormat,$outputFormat)
The main drawback to using LogParser is that it requires, well…installing LogParser. For this reason, let's look at one more approach.
In my previous Hey, Scripting Guy! post, Use PowerShell to Collect Server Data and Write to SQL, I demonstrated some utility functions for loading any Windows PowerShell data into SQL Server. Let's revisit this solution using the CSV file example:
Download the following scripts:
Run the following code to create a CSV file, convert to a data table, create a table in SQL Server, and load the data:
. .\out-datatable.ps1
. .\Add-SqlTable.ps1
. .\write-datatable.ps1
$dt = .\Get-DiskSpaceUsage.ps1 | Out-DataTable
Add-SqlTable -ServerInstance "Win7boot\Sql1" -Database "hsg" -TableName diskspaceFunc -DataTable $dt
Write-DataTable -ServerInstance "Win7boot\Sql1" -Database "hsg" -TableName "diskspaceFunc" -Data $dt
invoke-sqlcmd2 -ServerInstance "Win7boot\Sql1" -Database "hsg" -Query "SELECT * FROM diskspaceFunc" | Out-GridView
The following image shows the resulting table in Grid view.
The observant reader will notice that I didn't write the information to a CSV file. Instead, I created an in-memory data table that is stored in my $dt variable. This is because by using this approach, there was not a need to create a CSV file, but for completeness let's apply the solution to our CSV loading use case:
$dt = Import-Csv -Path "C:\Users\Public\diskspace.csv" | Out-DataTable
This post demonstrated three approaches to loading CSV files into tables in SQL Server by using a scripted approach. The approaches range from using the very simple T-SQL BULK INSERT command, to using LogParser, to using a Windows PowerShell function-based approach.
Thank you, Chad, for sharing this information with us. It looks like your last four scripts have the makings of an awesome NetAdminCSV module.
I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.
Ed Wilson, Microsoft Scripting Guy
Hi Chad,
this is very interesting and quite a bit surprising, that we can feed CSV files that easily to SQL Server!
The bulk load is my favorite solution while the four PS scripts seem be the most inetersting way to solve the problem with poweshell.
Working with Oracle DBs is quite similar. We have Oracle loader, sql plus or external tables that could easily be used to fulffill the same tasks. Building powershell scripts might as well lead to a similar solution ...
Anyway: Powershell is the glue used to combine "differnt worlds" in a very convenient way!
Klaus.
I really dont understand the purpose of doing this when we have SSIS. SSIS is very efficient at pulling in CSV files. It basically builds the Bulk Copy for you and is much more simple. This also allows you to import the SSIS pack into mssql and run it from agent.
@Charlie
I think you misunderstand the capabilities of SSIS. It cannot easily load any file from anywhere at any time. It is a server-side process an subject to the restrictions on the server.
Client side bulk loading is necessary for maintenance and and-hoc loads. It runs in the context of the user account and machine.
For recurring loads where we do have access to the input files we can use SSIS.
SSIS is never considered a replace meant for bulk loading. Mostly we use BCP which can do all of the above but having an integrated PowerShell method is also very useful at times.
@Klaus
It could be easier, I wish BULK INSERT had a text delimiter parameter.
BTW -- If you haven't checked it out SQLPSX http://sqlpsx.codeplex.com has a OracleBulkCopy function in the OracleClient module.
SSIS is a wonderfully featured tool that's great for ETL, but I'm not sure SSIS is well-suited for scripted-based adhoc CSV file loading. You can't create an SSIS package easily from the command-line and if your CSV file changes structure, for example if we run export-csv on various Powershell commands each time we would need to create a new SSIS package to define the new structure using a GUI tool. SSIS really doesn't like it when we change meta data. In the latter two approaches (LogParser and Powershell functions), we can automatically create and import any CSV file.
Easy way to do it
sqlcopy.blogspot.in/.../bulk-sql-to-sql-sql-to-csv-csv-to-sql.html