Bookmark and Share

(Note: These solutions were written for Advanced Event 8.)

Advanced Event 8 (Windows PowerShell)

Photo of Lee Holmes

Lee Holmes is a developer on the Microsoft Windows PowerShell team, author of the Windows PowerShell Cookbook and Windows PowerShell Pocket Reference. He also runs the Precision Computing Blog.

-----------

New-TestFile

A problem so nice, we solved it twice.

Once in awhile, you’ll run into the need to generate files of a specific size without caring much about what’s in them. We’ve been asked by our friends in the network team to help them with exactly that, and Windows PowerShell is more than up to the challenge.

In understanding their requirements, the first thing that jumps out is a perceived difference between the size of the file that we generate and its size on disk. Windows Explorer reports these as two separate numbers when you view the file properties, and they’ve asked us to guarantee that these remain within 1 percent of each other. The difference in these two numbers comes from the disk’s cluster size, also known as its allocation unit. To reduce the overhead required to manage your disk space, operating systems don’t dole out space to files on a byte-by-byte basis; they allocate space in larger chunks called (creatively) allocation units. For most systems, this is 4,096 bytes. Even if you only create a 1-byte file, Windows sets aside 4,096 bytes. If you create a file that is 4,096 bytes, Windows still sets aside 4,096 bytes. If you create a file that is 4,097 bytes, Windows sets aside 8,192 bytes. In essence, Windows rounds up to the nearest 4,096-byte chunk.

To figure out how much space is being used by a file, try this simple Windows PowerShell command:

   1: [Math]::Ceiling(<length> / 4096) * 4096

After clarifying this point, the networking team decided that the difference was not important and dropped the requirement that the files be within 1 percent of the “size on disk” as reported by Windows.

The broad dynamic range of Windows PowerShell makes this a fun challenge to solve in a number of ways. At the very simplest, we have a snippet to generate random content of whatever size we want:

   1: $random = New-Object System.Random
   2: $content = for($i = 0; $i -lt 1mb; $i++) { 
   3: [char] $random.Next(32, 127) }
   4: Set-Content file.txt (-join $content)

This works okay for small files, but we’re going to need much better performance if we want to start generating larger files. In addition, this snippet only covers the simplest of the scenario requirements: filling a file one character at a time until it reaches the specified limit. Enabling user-supplied content, while optional, is an equally important problem.

To support user-supplied content, we can just keep on using the Add-Content cmdlet to fill the file with their string as long as space remains for at least one more addition. For our last addition, we’ll have to chop the string down and only add as much as will fit.

As we take his approach, however, the performance is still a problem. The Add-Content cmdlet is designed for interactive use, so we’re asking it to do a lot of redundant work hundreds of thousands of times: verify its parameters, open the file, find the right place to add content to, close the file, and more. We’re most certainly stepping away from this interactive scenario, so using the file APIs from the .NET Framework is an attractive approach. At the most basic level, you use the $file = [System.IO.File]::OpenWrite() method to open a file, and then $file.Write(…) to write to that file. Finally, you call the $file.Close() method to let the .NET Framework know you’re done with it. This is seen in the following image.

Image of one method to write to file

Even with this approach, we have a large opportunity for improvement. If the user gives a small string (such as “Hello World”) for their custom text, we’ll be looping and calling $file.Write(…) an enormous number of times for a large file. In addition, hard drives work best when told to read and write large chunks of data. While most APIs (those in the .NET Framework included) try to batch your work to account for this, we can do a better job ourselves.

On a 100 MB file, batching 10-character chunks up into 16 KB chunks easily takes the performance from 35 seconds to about half a second.

How?

Well, we can create a buffer of a certain disk-friendly size (for example, 16 KB), and then fill the file by writing copies of that buffer instead. To create that buffer, we need to fill it with the user’s custom text (or random data) as long as space remains for at least one more addition. For our last addition, we’ll have to chop the string down and only add as much as…wait! Isn’t that the problem we’re already trying to solve?

Indeed. This is a problem so nice, we solved it twice. The complete script is seen here.

New-TestFile.ps1

   1: ##############################################################################
   2: ##
   3: ## 
   4: New-TestFile
   5: ##
   6: ## by Lee Holmes 
   7: (http://www.leeholmes.com/blog)
   8: ##
   9: ##############################################################################
  10: <#
  11: .SYNOPSIS
  12: Creates 
  13: a new file of the specified length. The file can be filled with the
  14: specified 
  15: TemplateContent, or will be filled with random data if no template
  16: content is 
  17: specified.
  18: .EXAMPLE
  19: New-TestFile -Path c:\temp\test.txt -Length 1mb 
  20: -Force
  21: This example creates a file called test.txt with 1 megabyte of 
  22: data,
  23: overwriting the file if it exists.
  24: #>
  25: param(
  26: ## The path of 
  27: the destination to create
  28: [Parameter(Mandatory = $True)]
  29: [string] 
  30: $Path,
  31: ## The size of the file to create
  32: [ValidateRange(0, 1gb)]
  33: [int] 
  34: $Length,
  35: ## The template content to use to fill the file
  36: [string] 
  37: $TemplateContent,
  38: ## Switch to overwrite the file if it exists
  39: [switch] 
  40: $Force
  41: )
  42: Set-StrictMode -Version Latest
  43: ## Check if the file exists. 
  44: Throw an error if it does, but overwrite the
  45: ## file if they used the -Force 
  46: switch.
  47: if(Test-Path $path)
  48: {
  49: if($Force)
  50: {
  51: Remove-Item $path 
  52: -Force
  53: }
  54: else
  55: {
  56: throw "The file '$path' already 
  57: exists."
  58: }
  59: }
  60: ## Writing to the disk is terribly slow when you do it in 
  61: small chunks.
  62: ## Since we'll usually be blasting out large streams of data, 
  63: we can be
  64: ## much more efficient by writing it out in larger 
  65: chunks.
  66: $chunkLength = 16kb
  67: $fillBytes = New-Object byte[] 
  68: $chunkLength
  69: ## If they gave us some template content, we'll fill the 
  70: 'fillBytes' buffer
  71: ## with their text. It's very likely that the text they 
  72: give us will not
  73: ## completely fill the buffer, so we have to pack it 
  74: ourselves.
  75: if($templateContent)
  76: {
  77: ## First, we convert their input to 
  78: an array of bytes. Normally, we would
  79: ## use [System.Text.Encoding]::Unicode 
  80: to get the bytes out of the string
  81: ## so that our network operators can fill 
  82: the file with Unicode strings. 
  83: ## However, files filled with Unicode data 
  84: don't represent typical files:
  85: ## for most languages, half the file ends up 
  86: being just zeros.
  87: $templateBytes = 
  88: [System.Text.Encoding]::ASCII.GetBytes($templateContent)
  89: ## Figure out how 
  90: much of the 'fillBytes' buffer is remaining. We'll start
  91: ## putting our 
  92: content at position zero in the buffer, and we'll write
  93: ## as many bytes as 
  94: $templateBytes holds.
  95: $bytesRemaining = $chunkLength
  96: $currentPosition = 
  97: 0
  98: $bytesToWrite = $templateBytes.Length
  99: ## Now loop filling up the 
 100: tempateBytes buffer
 101: while($bytesRemaining -gt 0)
 102: {
 103: ## If their input 
 104: text is larger than the remainder of the buffer,
 105: ## then we remember to only 
 106: write as much as will fit.
 107: if($bytesRemaining -lt 
 108: $templateBytes.Length)
 109: {
 110: $bytesToWrite = $bytesRemaining
 111: }
 112: ## Now 
 113: copy bytes from the templateBytes array into the current
 114: ## position in the 
 115: fillBytes array.
 116: [Array]::Copy($templateBytes,
 117: 0, $fillBytes, 
 118: $currentPosition, $bytesToWrite)
 119: ## Update our position counter (so that we 
 120: don't overwrite what we've
 121: ## already written), and update how much space is 
 122: left.
 123: $currentPosition += $bytesToWrite
 124: $bytesRemaining -= 
 125: $bytesToWrite
 126: }
 127: }
 128: else
 129: {
 130: ## They didn't specify any text. We'll 
 131: just fill the buffer with completely
 132: ## random data.
 133: $random = New-Object 
 134: System.Random
 135: for($index = 0; $index -lt $chunkLength; 
 136: $index++)
 137: {
 138: $fillBytes[$index] = $random.Next(32, 127)
 139: }
 140: }
 141: ## Now 
 142: actually create the file. We put this in a try / catch block so that we 
 143: have
 144: ## the chance to clean up after errors, or if the user hits 
 145: ^C
 146: try
 147: {
 148: ## Create the file, and use the .NET API to open the file. 
 149: There are plenty
 150: ## of PowerShell-only flavours to this approach, but they 
 151: build on cmdlets
 152: ## optimized for interactive use. Because we are generating 
 153: so much data,
 154: ## these cmdlets end up spending an enormous amount of time on 
 155: redundant tasks:
 156: ## error checking the parameters, opening the file, closing 
 157: the file, etc.
 158: $path = (New-Item -Type File -Path $path).FullName
 159: $file = 
 160: [IO.File]::OpenWrite($path)
 161: ## Now start filling the file, keeping track of 
 162: how many bytes we have
 163: ## remaining.
 164: $bytesRemaining = 
 165: $length
 166: while($bytesRemaining -gt 0)
 167: {
 168: ## If we don't have enough space 
 169: left to fit the entire $fillBytes
 170: ## buffer, we'll create a new array that 
 171: has only as much as will fit,
 172: ## and replace $fillBytes with that 
 173: array.
 174: if($bytesRemaining -lt $fillBytes.Length)
 175: {
 176: $bytesToWrite = 
 177: New-Object byte[] $bytesRemaining
 178: [Array]::Copy($fillBytes, $bytesToWrite, 
 179: $bytesRemaining)
 180: $fillBytes = $bytesToWrite
 181: }
 182: ## Finally, write the 
 183: bytes to the file, and recalculate how much
 184: ## we still need to 
 185: fill.
 186: $file.Write( $fillBytes, 0, $fillBytes.Length )
 187: $bytesRemaining -= 
 188: $fillBytes.Length
 189: }
 190: }
 191: finally
 192: {
 193: ## Close the file since we're 
 194: done with it.
 195: $file.Close()
 196: }
 197: ## New-* cmdlets generally emit the thing 
 198: they just created. This also lets us
 199: ## visually verify that it was the 
 200: correct length.
 201: Get-Item $path


Advanced Event 8 (VBScript)

Photo of Jakob Gottlieb Svendsen

Jakob Gottlieb Svendsen

  • TechNet Influent Denmark, TechNet Moderator
  • Main Areas is VBScript, Windows PowerShell, C#.NET, and VB.NET.
  • Working as an IT consultant/Microsoft Certified Trainer at Coretech A/S, Copenhagen, Denmark, www.coretech.dk
  • Blog about scripting and other stuff at http://blog.coretech.dk/author/jgs

How and Why

I always start by going through the job in my head.

In this script we would need something to write to the text files. At the same time we would need something to keep track of the size of the file, and stop when it is large enough.

I thought about different ways to do it. I could calculate and hard-code how many bytes one character in a text file is, and then write as many as needed.

But I decided to go for the “easy” and more direct way. I will write one line at a time, and then check the size afterward, to see if the file is big enough.

I am not using a large string, because that would make it less precise. The problem with using a small string is that when we are creating a 100 MB file, it is going to need a lot of strings and it will take a while. I assumed that the network department rather wants a precise file than a fast process. This means my computer could spend about 20 minutes when creating a100 MB file, but I guess the networking guys can start it and do what they do most of the time (Facebook, Twitter, etc.).

I decided to make the script to require command prompt argument, making it easy for the networking guys to change the size when they need to.

The Script Sections

Header section

I always write the header section, containing information about the version history, usage, error codes, and other important information.

This makes it easier for the customers to understand how to use it, and for myself too, when I need to fix it years after coding it!

Declare section

Most of the time I use Option Explicit. This is a good idea, and I have discovered that many enterprise companies like to have control of the script content, and therefore require me to use it. This requires me to explicit declare all variables.

I try to use Hungarian notation (or something similar) on all my variable names to make it easier for myself.

Main routines section

Because this script is very small, most of the code is in the main section.

First, I check if the argument is present, and that only one argument has been given. This is important because the script would fail if it were not supplied. I read the argument and pass it through the UCase function to make sure that everything is uppercase. If no arguments or more than one argument is supplied, I quit the script with error code 1. The error is making it easier to implement in automatic solutions/batch files.

Next up is the Select Case. Here I check the argument to see if any of the predefined sizes are specified. I always use Select Case when I need to do a simple check on a variable content, because I think it has the best overview. If one of the standard sizes is specified, the nSize variable is set to the correct size in bytes. Otherwise the size is set to the specified bytes.

I make sure the argument was an integer by enabling On Error Resume Next, and I try to convert it to int. If an error has occurred, I quit the script (error code 2); if no error has occurred, I re-enable halt on error by using the On Error Goto 0 statement and I put a “B” in the end of the size, to make it ready for the filename.

Now I assemble the filename, using the strSelectedSize variable that contains the size of the file, prefix, and suffix needed.

Now I am sure that the size and filename are set correctly. Therefore, I create the objects for FSO, TextFile, and File. I could have created the objFSO in the declare section, like the other objects, but there was no reason to do so before the argument had been confirmed.

I use objFSO.CreateTextFile to create the file, supplying the filename and True, allowing the script to overwrite any existing file. I also create a file object with objFSO.GetFile. This is used to check the size of the file.

Now is it just a matter of filling the file, one line at a time, using a while loop, and checking the size every time. When it hits the correct size, the loop ends.

House cleaning

I always do house cleaning (in my scripts!).

It might not be necessary in this script, but it sometimes is very important to clean up the connections/objects correctly. Therefore, I always run the Close method, and set all objects to Nothing.

When the script runs, we see the file properties detailed in the following image.

Image of files properties shown when script is run

The complete script is seen here.

AdvancedEvent8.vbs

   1: ' 
   2: //***************************************************************************
   3: ' 
   4: // ***** Script Header *****
   5: ' //
   6: ' // Solution: 2010 Scripting Games 
   7: Advanced Event 8
   8: ' // File: AdvancedEvent8.vbs
   9: ' // Author: Jakob Gottlieb 
  10: Svendsen, Coretech A/S. jgs@coretech.dk
  11: ' // Purpose: Create dummy text files 
  12: in specified sizes
  13: ' // Loops every 60 seconds
  14: ' //
  15: ' // Usage: 
  16: cscript.exe AdvancedEvent8.vbs 100K
  17: ' // cscript.exe AdvancedEvent8.vbs 
  18: 1M
  19: ' // cscript.exe AdvancedEvent8.vbs 10M
  20: ' // cscript.exe 
  21: AdvancedEvent8.vbs 100M
  22: ' // custom size in bytes:
  23: ' // cscript.exe 
  24: AdvancedEvent8.vbs 123465
  25: ' //
  26: ' // CORETECH A/S History:
  27: ' // 0.0.1 
  28: JGS 26/03/2010 Created initial version.
  29: ' //
  30: ' // Customer History:
  31: ' 
  32: //
  33: ' // ErrorCodes:
  34: ' // 1: Wrong number of argument supplied
  35: ' // 2: 
  36: Argument is not a standard size (100K etc) or an integer.
  37: ' // ***** End 
  38: Header *****
  39: ' 
  40: //***************************************************************************
  41: '//----------------------------------------------------------------------------
  42: '//
  43: '// 
  44: Global constant and variable 
  45: declarations
  46: '//
  47: '//----------------------------------------------------------------------------
  48: Option 
  49: Explicit 'always using option explicit
  50: Dim nSize, strTargetFile, 
  51: strSelectedSize
  52: Dim objFSO, objTextFile, 
  53: objFile
  54: '//----------------------------------------------------------------------------
  55: '// 
  56: Main 
  57: routines
  58: '//----------------------------------------------------------------------------
  59: 'Count 
  60: arguments, quit with errorcode if wrong 
  61: If WScript.Arguments.Count = 1 
  62: Then
  63: strSelectedSize = UCase(WScript.Arguments.Item(0)) ' Read argument to 
  64: variable, using Ucase to make sure it is upper case
  65: Else ' No arguments in 
  66: command line, quit with errorcode 1
  67: WScript.Echo "Please provide one argument 
  68: with the preferred Size, valid arguments are: 100K, 1M, 10M, 100M or a integer 
  69: number (120345 etc.)"
  70: WScript.Quit(1)
  71: End If
  72: 'Select the correct size, 
  73: otherwise use custom size in bytes.
  74: Select Case strSelectedSize
  75: Case 
  76: "100K" nSize = 100 * 1024
  77: Case "1M" nSize = 1 * 1024 * 1024
  78: Case "10M" 
  79: nSize = 10 * 1024 * 1024
  80: Case "100M" nSize = 100 * 1024 * 1024
  81: Case 
  82: Else
  83: On Error Resume Next 'Enable resume on error, since we need to test the 
  84: conversion of argument to integer.
  85: nSize = 
  86: CInt(WScript.Arguments.Item(0))
  87: If Err.Number <> 0 Then ' if wrong 
  88: format, write message and quit with errorcode 2
  89: WScript.Echo "Argument wrong 
  90: format valid formats are: 100K, 1M, 10M, 100M or a integer number (120345 
  91: etc.)"
  92: WScript.Quit(2)
  93: End If
  94: strSelectedSize = strSelectedSize & 
  95: "B" 'add a B for bytes, since it is used in the name of the text file
  96: On 
  97: Error Goto 0 ' reenable break on error
  98: End Select
  99: strTargetFile = 
 100: "TestFile" & strSelectedSize & ".txt" 'Setup filename from 
 101: argument.
 102: 'Create FSO, Textfile and File object to keep track of the size of 
 103: the file.
 104: Set objFSO = CreateObject("Scripting.FileSystemObject")
 105: Set 
 106: objTextFile = objFSO.CreateTextFile(strTargetFile, True) 'using CreateTextFile 
 107: with overwrite enabled
 108: Set objFile = objFSO.GetFile(strTargetFile)
 109: 'Write 
 110: lines with test data until the specified size is reached.
 111: Do While 
 112: objFile.Size <= nSize
 113: objTextFile.WriteLine "Test Data - Test Data - Test 
 114: Data - Test Data - Test Data - Test Data - Test Data - Test 
 115: Data"
 116: Loop
 117: '//----------------------------------------------------------------------------
 118: '// 
 119: House 
 120: Cleaning
 121: '//----------------------------------------------------------------------------
 122: 'Close 
 123: the file
 124: objTextFile.Close
 125: 'Remove objects from memory
 126: Set objTextFile 
 127: = Nothing
 128: Set objFile = Nothing
 129: Set objFSO = 
 130: Nothing
 131: '//----------------------------------------------------------------------------
 132: '// 
 133: End 
 134: Script
 135: '//----------------------------------------------------------------------------

 

If you want to know exactly what we will be looking at tomorrow, follow us on Twitter or Facebook. If you have any questions, send e-mail to us at scripter@microsoft.com or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson and Craig Liebendorfer, Scripting Guys